Architecture
Mixture of Experts (MoE)
Also known as: moe, mixture-of-experts model
In one line
A model architecture that routes each input to a small subset of specialised "expert" sub-networks.
What does Mixture of Experts (MoE) mean?
MoE models (Mixtral, DeepSeek V3, GPT-4) have huge total parameters but only activate a fraction per token, giving strong quality at lower cost.
A real-world example
Mixtral 8x7B has 47B total parameters but only ~13B are active per token.
Related terms
Large Language Model (LLM)
A neural network trained on huge text collections to predict the next word — the engine behind ChatGPT, Claude and Gemini.
Transformer
The neural network architecture behind almost every modern LLM.
Inference
The act of running a trained model to get an answer — as opposed to training it.

