Architecture

Attention Mechanism

Also known as: self-attention

In one line

The trick that lets a model weigh which parts of the input matter most when producing each word of output.

What does Attention Mechanism mean?

Self-attention computes relationships between every token and every other token, so the model can focus on the most relevant context regardless of distance in the text.

A real-world example

Understanding that "it" in "the trophy didn't fit in the suitcase because it was too big" refers to the trophy, not the suitcase.

Related terms