Core Concepts
Training
In one line
The expensive process of teaching a model by adjusting its weights on huge amounts of data.
What does Training mean?
Training uses gradient descent over trillions of tokens to shape a model's parameters. Pretraining creates a foundation model; fine-tuning adapts one to a task.
A real-world example
Meta reportedly spent ~$500m of GPU time training Llama 3.

