Maarten Bosma

Maarten Bosma

AI researcher focused on large language models.

About

I focus on data, evaluation, and post-training for large language models. I enjoy starting new research directions and building systems to support them.

My career in LLM research began with a 20% project at Google Brain where I helped train Google Brain's first large-scale autoregressive language model. I have been a Research Engineer at Google Brain, a founding team member at Inflection, and a Principal Member of Technical Staff at Microsoft AI.

Since October 2024 I have been on sabbatical. I have been traveling, staying at a forest monastery in Thailand where I was temporarily ordained as a monk. I have also been working on side projects, practicing meditation, and learning Thai.

Selected Work

  • Instruction tuning

    I started the FLAN project at Google, which introduced instruction tuning, and co-led the first academic publication on the concept. I trained the initial model and organized the research effort to show that supervised fine-tuning on instruction-formatted datasets improves performance on unseen tasks. Instruction tuning has since become a standard first step in modern LLM post-training.

  • Reasoning

    Co-author of the paper that introduced Chain of Thought. We showed that breaking problems into explicit steps improves accuracy on difficult tasks. This work helped establish methods used in reasoning-focused models and in scaling inference-time compute.

  • Code generation

    I created the Mostly Basic Python Problems (MBPP) dataset, released in August 2021, an early benchmark that researchers widely use to evaluate Python program synthesis, contemporaneous with HumanEval.

  • Foundational model building

    While at Google Brain, I was a core contributor to PaLM and worked across LaMDA and GLaM, building pretraining data pipelines, running data-quality experiments, and setting up evaluation systems. At Inflection, I designed and ran pretraining, post-training, and safety experiments, and built pretraining and evaluation pipelines that took us from zero to shipping Inflection-2.

See Google Scholar for a complete list of publications.

Industry Experience

  • Microsoft AI

    Mar 2024 – Oct 2024

    Principal Member of Technical Staff. Worked at Microsoft and on-site at OpenAI to launch and adopt GPT-4o realtime voice in Microsoft Copilot.

  • Inflection AI

    Apr 2022 – Mar 2024

    Founding Member of Technical Staff. Designed and ran post-training and safety experiments; built data and evaluation pipelines for pretraining and post-training; helped ship Inflection-1, Inflection-2 and Pi.

  • Google

    Aug 2016 – Apr 2022

    Research Engineer at Google Brain. Created pretraining data pipelines, ran data-quality ablations, and designed evaluation systems for LaMDA, GLaM, and PaLM. Core contributor to PaLM.

Before working on LLM research, I worked as an ML Engineer and Data Scientist at Google Ads and Whisper.