8. Multimodal

Overview

Multimodal models combining text with vision, audio, and actions; brief taxonomy and references.