# 第六章：训练技术的底层逻辑

确定了预训练目标和架构之后，如何让训练过程真正有效地收敛到良好的解，是一个充满技术细节的工程挑战。损失函数的选择、优化器的设计、学习率的调度和正则化策略——这些看起来“只是超参数调整”的工作，实际上包含着深刻的数学原理和丰富的工程经验。

本章将解析这些训练技术的底层逻辑，帮助读者理解为什么 Adam 成为了默认优化器、为什么学习率需要先预热再衰减、以及如何在批次大小和序列长度之间取得最佳平衡。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://yeasy.gitbook.io/llm_internals/di-er-bu-fen-xun-lian-pian/06_training_techniques.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
