About
I focus on data, evaluation, and post-training for large language models. I enjoy starting new research directions and building systems to support them.
My career in LLM research began with a 20% project at Google Brain where I helped train Google Brain's first large-scale autoregressive language model. I have been a Research Engineer at Google Brain, a founding team member at Inflection, and a Principal Member of Technical Staff at Microsoft AI.
Since October 2024 I have been on sabbatical. I have been traveling, staying at a forest monastery in Thailand where I was temporarily ordained as a monk. I have also been working on side projects, practicing meditation, and learning Thai.
Selected Work
-
Instruction tuning
I started the FLAN project at Google, which introduced instruction tuning, and co-led the first academic publication on the concept. I trained the initial model and organized the research effort to show that supervised fine-tuning on instruction-formatted datasets improves performance on unseen tasks. Instruction tuning has since become a standard first step in modern LLM post-training.
-
Reasoning
Co-author of the paper that introduced Chain of Thought. We showed that breaking problems into explicit steps improves accuracy on difficult tasks. This work helped establish methods used in reasoning-focused models and in scaling inference-time compute.
-
Code generation
I created the Mostly Basic Python Problems (MBPP) dataset, released in August 2021, an early benchmark that researchers widely use to evaluate Python program synthesis, contemporaneous with HumanEval.
-
Foundational model building
While at Google Brain, I was a core contributor to PaLM and worked across LaMDA and GLaM, building pretraining data pipelines, running data-quality experiments, and setting up evaluation systems. At Inflection, I designed and ran pretraining, post-training, and safety experiments, and built pretraining and evaluation pipelines that took us from zero to shipping Inflection-2.
See Google Scholar for a complete list of publications.
Industry Experience
-
Microsoft AI
Mar 2024 – Oct 2024Principal Member of Technical Staff. Worked at Microsoft and on-site at OpenAI to launch and adopt GPT-4o realtime voice in Microsoft Copilot.
-
Inflection AI
Apr 2022 – Mar 2024Founding Member of Technical Staff. Designed and ran post-training and safety experiments; built data and evaluation pipelines for pretraining and post-training; helped ship Inflection-1, Inflection-2 and Pi.
-
Google
Aug 2016 – Apr 2022Research Engineer at Google Brain. Created pretraining data pipelines, ran data-quality ablations, and designed evaluation systems for LaMDA, GLaM, and PaLM. Core contributor to PaLM.
Before working on LLM research, I worked as an ML Engineer and Data Scientist at Google Ads and Whisper.