Architecture

Mixture of Experts (MoE)

Also known as: moe, mixture-of-experts model

In one line

A model architecture that routes each input to a small subset of specialised "expert" sub-networks.

What does Mixture of Experts (MoE) mean?

MoE models (Mixtral, DeepSeek V3, GPT-4) have huge total parameters but only activate a fraction per token, giving strong quality at lower cost.

A real-world example

Mixtral 8x7B has 47B total parameters but only ~13B are active per token.

Related terms