# 第三章：Transformer 核心组件解析

注意力机制赋予了 Transformer 建立全局依赖关系的能力，但一个完整的 Transformer 层远不止注意力。分词作为第一步将原始文本切分为离散词元，词嵌入将其转化为连续向量，位置编码注入了序列表达顺序信息，前馈网络提供了非线性变换能力，残差连接确保梯度能流过深层网络，层归一化稳定了训练过程。

这些组件看起来各自独立，但它们的协同作用构成了 Transformer 的完整计算管道。每一个组件的存在都不是偶然的——它解决了特定的技术问题，缺少任何一个，架构的能力都会显著受损。

本章将逐一解析这些组件的原理和设计动机，并在最后展示它们如何组装成完整的编码器-解码器架构。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://yeasy.gitbook.io/llm_internals/di-yi-bu-fen-ji-chu-pian/03_components.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.