# 第七章：大规模分布式训练

当模型参数从数百万扩展到数千亿，单张 GPU 的显存和计算能力远远不够。大规模分布式训练是将 Transformer 从论文中的小模型变为拥有数千亿参数的大语言模型的关键工程技术。

本章系统介绍分布式训练的核心策略——数据并行、ZeRO 优化、张量并行、流水线并行以及混合精度训练——解释每种技术解决的具体问题和背后的设计逻辑。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://yeasy.gitbook.io/llm_internals/di-er-bu-fen-xun-lian-pian/07_distributed_training.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.