Architecture
Attention Mechanism
Also known as: self-attention
In one line
The trick that lets a model weigh which parts of the input matter most when producing each word of output.
What does Attention Mechanism mean?
Self-attention computes relationships between every token and every other token, so the model can focus on the most relevant context regardless of distance in the text.
A real-world example
Understanding that "it" in "the trophy didn't fit in the suitcase because it was too big" refers to the trophy, not the suitcase.

