# 第二章：注意力机制：为什么它是核心

注意力机制是 Transformer 架构的灵魂。如果说 Transformer 是一座建筑，那么注意力机制就是支撑整个结构的钢筋骨架——理解它的每一个设计细节，是真正掌握 Transformer 的基础。

上一章介绍了注意力机制的诞生背景，本章将深入其内部运作原理。不仅会给出数学公式，更重要的是解释每一个设计选择背后的直觉与动机：查询-键-值的三元组为什么这样设计？缩放因子 $$\sqrt{d\_k}$$ 解决了什么数学问题？多头注意力为什么比单头更强大？因果掩码为什么是生成任务的必需品？注意力的 $$O(n^2)$$ 复杂度意味着什么、又如何应对？


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://yeasy.gitbook.io/llm_internals/di-yi-bu-fen-ji-chu-pian/02_attention.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.