Techniques

RLHF

Also known as: reinforcement learning from human feedback

In one line

Reinforcement Learning from Human Feedback — the technique that turned raw LLMs into helpful assistants.

What does RLHF mean?

RLHF trains a reward model from human preference rankings, then fine-tunes the LLM to maximise that reward. It's how ChatGPT became useful and polite.

A real-world example

Human raters picked the better of two ChatGPT responses tens of thousands of times to build its reward model.

Related terms