Risks & Ethics

Alignment

In one line

The problem of making AI do what humans actually want — safely and helpfully.

What does Alignment mean?

Alignment covers technical techniques (RLHF, Constitutional AI) and governance to ensure AI systems reflect human values and don't cause harm.

RLHF training that teaches a model to refuse dangerous requests.

Reinforcement Learning from Human Feedback — the technique that turned raw LLMs into helpful assistants.