Techniques

Distillation

Also known as: model distillation

In one line

Training a small, fast model to mimic a large, slow one.

What does Distillation mean?

Distillation transfers knowledge from a big "teacher" model to a small "student" that's cheaper to run. Gemini Flash and Claude Haiku are distilled from their larger siblings.

A real-world example

DeepSeek distilling reasoning traces from R1 into smaller models that still solve maths problems.

Related terms