Techniques
Evaluation (Evals)
Also known as: evals, benchmark
In one line
A structured test of an AI's accuracy, safety, or usefulness on specific tasks.
What does Evaluation (Evals) mean?
Evals are essential for shipping AI products responsibly. They mix automated scoring (exact match, BLEU, LLM-as-judge) with human review to catch regressions.
A real-world example
Running a 200-question eval set every time you change the system prompt to check nothing regressed.
Related terms
Large Language Model (LLM)
A neural network trained on huge text collections to predict the next word — the engine behind ChatGPT, Claude and Gemini.
Training
The expensive process of teaching a model by adjusting its weights on huge amounts of data.
Alignment
The problem of making AI do what humans actually want — safely and helpfully.

