Performance

Latency

In one line

How long the AI takes to start (or finish) replying.

What does Latency mean?

Latency matters most for real-time chat, voice, and agents. Reasoning models trade latency for accuracy; smaller models like Gemini Flash trade a bit of quality for speed.

A real-world example

A voice assistant needs <500ms to feel natural; a batch summariser can tolerate 30s.

Related terms