Architecture

Mixture of Experts (MoE)

Also known as: moe, mixture-of-experts model

In one line

A model architecture that routes each input to a small subset of specialised "expert" sub-networks.

What does Mixture of Experts (MoE) mean?

MoE models (Mixtral, DeepSeek V3, GPT-4) have huge total parameters but only activate a fraction per token, giving strong quality at lower cost.

Mixtral 8x7B has 47B total parameters but only ~13B are active per token.

A neural network trained on huge text collections to predict the next word — the engine behind ChatGPT, Claude and Gemini.

The neural network architecture behind almost every modern LLM.

The act of running a trained model to get an answer — as opposed to training it.