# 第九章 多模态与生成式 AI

> 探索 AI 如何理解和创造图像、音频、视频

***

AI 的能力已经超越了纯文本领域。多模态 AI 能够同时处理文本、图像、音频、视频等多种形式的数据；生成式 AI 则能够创造全新的内容。本章将介绍这些令人兴奋的技术方向。

## 本章内容

* **9.1 多模态学习**：理解多模态学习的基本概念
* **9.2 图像生成与扩散模型**：解析 DALL-E、Stable Diffusion 等模型
* **9.3 视频与音频生成**：了解 Sora、Suno 等新兴技术
* **9.4 原生全模态与具身智能**：探索多模态技术的实际应用


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://yeasy.gitbook.io/ai_beginner_guide/di-er-bu-fen-he-xin-ji-shu-jie-xi/09_multimodal_genai.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
