Core Concepts

Multimodal AI

Also known as: multi-modal

In one line

A model that understands more than one type of input — e.g. text, images, and audio together.

What does Multimodal AI mean?

Multimodal models can, for example, look at a photo of a receipt and answer questions about it, or listen to a meeting and produce structured minutes.

A real-world example

Uploading a screenshot to ChatGPT and asking "why is my code broken?"

Related terms