> For the complete documentation index, see [llms.txt](https://yeasy.gitbook.io/ai_beginner_guide/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://yeasy.gitbook.io/ai_beginner_guide/di-er-bu-fen-he-xin-ji-shu-jie-xi/09_multimodal_genai.md).

# 第九章 多模态与生成式 AI

> 探索 AI 如何理解和创造图像、音频、视频

***

AI 的能力已经超越了纯文本领域。多模态 AI 能够同时处理文本、图像、音频、视频等多种形式的数据；生成式 AI 则能够创造全新的内容。本章将介绍这些令人兴奋的技术方向。

## 本章内容

* **9.1 多模态学习**：理解多模态学习的基本概念
* **9.2 图像生成与扩散模型**：解析 DALL-E、Stable Diffusion 等模型
* **9.3 视频与音频生成**：了解 Sora、Suno 等新兴技术
* **9.4 原生全模态与具身智能**：探索多模态技术的实际应用