Techniques

RLHF

Also known as: reinforcement learning from human feedback

In one line

Reinforcement Learning from Human Feedback — the technique that turned raw LLMs into helpful assistants.

What does RLHF mean?

RLHF trains a reward model from human preference rankings, then fine-tunes the LLM to maximise that reward. It's how ChatGPT became useful and polite.

A real-world example

Human raters picked the better of two ChatGPT responses tens of thousands of times to build its reward model.

Related terms

Fine-tuning

Training an existing model further on your own examples to change how it behaves.

Training

The expensive process of teaching a model by adjusting its weights on huge amounts of data.

Alignment

The problem of making AI do what humans actually want — safely and helpfully.