A Fully Open Language Model Recipe

In the fast-moving, hype-filled world of Large Language Models (LLMs), the term "Open Source" has faced a severe identity crisis. We live in an era dominated by "Open Weights". Tech giants like Mistral or Qwen give us the final dish, the trained model. But they keep the ingredients list (the exact dataset) and the cooking method (the training logs and specific hyperparameters) a closely guarded trade secret.

For researchers and developers, this creates a significant barrier. It is akin to a chef tasting a Michelin-star meal but having no idea how to recreate it. You can eat it, you can plate it differently, but you cannot learn the science of cooking from it.

Olmo 3, from the Allen Institute for AI (Ai2), is not just another model drop. It is a scientific artifact. Unlike its competitors, Olmo 3 releases everything: the weights, the massive training data (Dolma 3), the full training code, and the detailed logs. Based on Cameron R. Wolfe’s deep technical analysis of the Olmo 3 release, we can finally reconstruct the exact, step-by-step recipe for building a modern, reasoning-capable Large Language Model.

Here is the fully open language model recipe for Olmo 3.

Table of Contents

Phase 1: The Ingredients (Data Curation & Dolma 3)

The difference between a hallucinating chatbot and a robust reasoning engine is rarely the model architecture; it is almost always the data. Olmo 3 relies on a massive corpus called Dolma 3. But the secret isn't just volume, it’s the mix and the preparation.

1. Token-Constrained Mixing (The Ratio)

You cannot simply "download the internet" and expect a smart model. The recipe requires a precise balance. Olmo 3 uses a technique called Swarm Optimization to find the perfect mathematical ratio between:

Web Content (Common Crawl): General knowledge.
Code (The Stack): Logic and reasoning structure.
Academic PDFs: Deep technical knowledge (cutoff Dec 2024).

Term to Know: Ablation Studies
Throughout the Olmo 3 recipe, you will frequently hear about "Ablations." In AI science, this means systematically removing one specific ingredient (e.g., "What happens if we remove all math data?") to measure its exact impact on the final flavor. Olmo 3 is built on thousands of these small tests.

2. Quality-Aware Upsampling

Not all tokens are created equal. A textbook on calculus is worth more than a thousand reddit comments. Olmo 3 introduces Quality-Aware Upsampling. Instead of feeding data linearly, the recipe identifies the highest-quality top 5% of the data and repeats it multiple times (up to 7x) during training. This ensures the model learns the "truth" more deeply than the noise.

Phase 2: The Utensils (Architecture & Stability)

Once you have the ingredients, you need the right equipment. Olmo 3 generally follows the standard Transformer decoder architecture (similar to Llama), but with specific modifications designed for stability and efficiency.

No Bias Terms: The model removes bias terms from the network to improve training stability.
SwiGLU & QK-Norm: Modern activation functions and normalization techniques to keep the flow of information smooth.
Sliding Window Attention (SWA): To be efficient, the model doesn't look at every single previous word effectively in every layer; it uses a sliding window to reduce memory bandwidth usage during inference.

Phase 3: The Bake (Midtraining & Annealing)

Training an LLM isn't a straight line from start to finish. It is a dynamic process involving temperature changes and stage-gates.

1. Microannealing & Integration Tests

Before committing to a massive, expensive training run, Olmo 3 uses Microannealing. This involves running many small, parallel experiments to test new data sources rapidly. It’s like testing a spice on a single spoonful of soup before adding it to the whole pot. Only the data streams that pass these tests are combined into the final Dolmino Mix for the midtraining phase.

Term to Know: Learning Rate Annealing
In metallurgy, annealing involves heating metal and cooling it slowly to toughen it. In AI, this means starting with a high "learning rate" (making big changes to the model's brain) and linearly decaying it to zero. This "polishes" the model’s weights before the final stage.

2. The Long Context Stretch (YaRN)

Olmo 3 is designed to handle long documents, but training on long sequences (32k+ tokens) from day one is prohibitively expensive. The solution? Context Extension.
After the main training, the model undergoes a specific phase using YaRN (Yet another RoPE extension) applied only to full attention layers. This mathematical trick allows the model to generalize to longer conversations without having seen billions of long examples during the initial phase.

Phase 4: The Flavor Profile (Post-Training & Delta Learning)

At this stage, the base model is "cooked," but it’s raw. It needs to be seasoned to follow instructions (Instruct) or solve puzzles (Think). This happens via SFT (Supervised Fine-Tuning) and DPO (Direct Preference Optimization).

However, Olmo 3 introduces a unique twist called Delta Learning.

Term to Know: Delta Learning Standard preference training teaches a model "Option A is good, Option B is bad."
Delta Learning maximizes the signal by ensuring the gap between options is huge. It learns best when shown a "Chosen" response from a very smart model (like Qwen-32B) and a "Rejected" response from a much weaker model (like Qwen-0.6B). Focusing on the delta (difference) creates a much stronger learning signal than comparing two similar answers.

Phase 5: The Secret Sauce (Reasoning via RLVR & GRPO)

This is the most cutting-edge part of the Olmo 3 recipe. To build the Olmo 3 Think model (which rivals proprietary reasoning models), Ai2 moves beyond simple imitation and uses Reinforcement Learning (RL).

RLVR (Reinforcement Learning with Verifiable Rewards)

How do you reward a model? For creative writing, it's subjective. But for Math and Code, it's binary: the code runs, or it doesn't. The answer is correct, or it isn't.
Olmo 3 uses RLVR to train the model on math and coding problems where the "reward" is verifiable. This allows the model to explore complex "Chains of Thought" and get feedback on the final result.

GRPO (Group Relative Policy Optimization)

Traditional RL (like PPO) requires a massive secondary model called a "Critic" to judge every step, which is computationally heavy.
Olmo 3 adopts GRPO. Instead of a Critic, GRPO samples a group of outputs (e.g., 8 different solutions to the same math problem) and compares them against each other. It effectively asks, "Which of these 8 attempts was the best?" and reinforces that path relative to the group. This is far more efficient.

The Infrastructure: OlmoRL

You can't cook a banquet on a camping stove. Because RL training involves generating massive amounts of text (rollouts), Ai2 built OlmoRL. This infrastructure uses fully asynchronous rollouts, ensuring the GPU never sits idle while waiting for the model to "think," cutting training time from 15 days down to just 6.

Phase 6: The Experimental Dish (RL-Zero)

Finally, Olmo 3 explores a radical concept: RL-Zero.

Usually, you teach a model to talk (SFT) before you teach it to reason (RL). RL-Zero asks: "Can we skip the teaching?"
It applies Reinforcement Learning directly to the raw Base model. The result? The model begins to self-evolve reasoning capabilities purely through trial and error, provided the rewards are verifiable. This is crucial for checking "Data Contamination", ensuring the model isn't just memorizing answers but actually learning to solve problems.

Conclusion

Why care about a Fully Open Recipe when Gemma 3 or DeepSeek already exist? Because science should be free, not rented.
“Open weights” and proprietary APIs still create dependency and concentrate control over AI in the hands of a few corporations. Black-box models hide assumptions, biases, and constraints we cannot inspect or question.

Olmo 3 proves that AI progress doesn’t require secrecy. By opening the entire pipeline (data, logs, and failures), it returns control to the community and makes AI a reproducible science again.

The recipe is open so the future of AI can belong to everyone.

References

[1] https://cameronrwolfe.substack.com/p/olmo-3
[2] https://allenai.org/blog/olmo3

Eda Yılmaz

Data Scientist