> For the complete documentation index, see [llms.txt](https://yeasy.gitbook.io/prompt_engineering_guide/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://yeasy.gitbook.io/prompt_engineering_guide/di-san-bu-fen-gao-ji-ying-yong-pian/10_multimodal.md).

# 第十章 多模态提示工程

多模态 AI 模型可以理解和处理文本、图像、音频、视频等多种形式的输入。本章将介绍如何设计有效的多模态提示词，充分发挥这些模型的能力。

***

## 本章目标

* 理解本章核心概念与适用场景
* 掌握可复用的提示词/工作流模式
* 能将方法迁移到自己的任务中

## 先修知识

* 建议先阅读上一章或同等基础内容
* 如涉及代码示例，具备基本编程与 API 调用常识

## 本章内容

* [10.1 多模态模型概述](/prompt_engineering_guide/di-san-bu-fen-gao-ji-ying-yong-pian/10_multimodal/10.1_multimodal_overview.md)
* [10.2 图像理解与视觉提示](/prompt_engineering_guide/di-san-bu-fen-gao-ji-ying-yong-pian/10_multimodal/10.2_image_prompting.md)
* [10.3 音频与视频处理](/prompt_engineering_guide/di-san-bu-fen-gao-ji-ying-yong-pian/10_multimodal/10.3_audio_video.md)
* [10.4 跨模态推理与融合](/prompt_engineering_guide/di-san-bu-fen-gao-ji-ying-yong-pian/10_multimodal/10.4_cross_modal_reasoning.md)
* [10.5 多模态提示词工程进阶：融合文本、图像、音频与视频](/prompt_engineering_guide/di-san-bu-fen-gao-ji-ying-yong-pian/10_multimodal/10.5_multimodal_prompting_advanced.md)
* [10.6 本章实战练习](/prompt_engineering_guide/di-san-bu-fen-gao-ji-ying-yong-pian/10_multimodal/10.6_practice.md)
* [本章小结](/prompt_engineering_guide/di-san-bu-fen-gao-ji-ying-yong-pian/10_multimodal/summary.md)