Techniques
Distillation
Also known as: model distillation
In one line
Training a small, fast model to mimic a large, slow one.
What does Distillation mean?
Distillation transfers knowledge from a big "teacher" model to a small "student" that's cheaper to run. Gemini Flash and Claude Haiku are distilled from their larger siblings.
A real-world example
DeepSeek distilling reasoning traces from R1 into smaller models that still solve maths problems.

