Risks & Ethics

Alignment

In one line

The problem of making AI do what humans actually want — safely and helpfully.

What does Alignment mean?

Alignment covers technical techniques (RLHF, Constitutional AI) and governance to ensure AI systems reflect human values and don't cause harm.

A real-world example

RLHF training that teaches a model to refuse dangerous requests.

Related terms