Core Concepts
Multimodal AI
Also known as: multi-modal
In one line
A model that understands more than one type of input — e.g. text, images, and audio together.
What does Multimodal AI mean?
Multimodal models can, for example, look at a photo of a receipt and answer questions about it, or listen to a meeting and produce structured minutes.
A real-world example
Uploading a screenshot to ChatGPT and asking "why is my code broken?"

