# 本章小结

本章追溯了从传统序列建模到 Transformer 架构的完整演进路径。核心要点如下：

**序列建模的三大挑战**：变长输入处理、长距离依赖捕捉和计算效率，构成了评判序列模型的基本维度。

**从文字到向量**：神经网络无法直接处理文字，文本必须经过“分词→索引映射→词嵌入”的管线转化为连续向量，这是所有序列模型的输入基础。

**RNN 的成就与局限**：RNN 通过隐藏状态的循环传递解决了变长输入问题，但梯度消失使其难以学习长距离依赖，且串行计算结构导致无法并行。LSTM 和 GRU 通过门控机制缓解了梯度消失，但串行瓶颈依然存在。

**注意力机制的诞生与编码器-解码器思想**：Seq2Seq 模型将“理解”和“生成”分离为编码器和解码器两个独立模块，开创了影响深远的架构范式。为解决编码器将所有信息压缩为固定向量的信息瓶颈问题，Bahdanau 提出了注意力机制，让解码器能够选择性地关注编码器的不同位置，信息传递路径从 $$O(n)$$ 降至 $$O(1)$$。

**CNN 的尝试**：一维 CNN 支持并行计算，但有限的感受野使其难以直接捕捉长距离依赖。

**Transformer 的核心思想**：“Attention Is All You Need”——完全抛弃循环和卷积结构，仅用自注意力和前馈网络构建模型。这一设计同时解决了并行计算、长距离依赖和信息传递效率三个问题。

**从论文到产业变革**：Transformer 催生了预训练革命（GPT、BERT），推动了规模定律的发现（GPT-3），最终发展为当代大语言模型的基础架构（GPT-4、Llama、DeepSeek 等）。

下一章将深入 Transformer 最核心的组件——注意力机制，详细解析其数学原理和每一个设计选择背后的原因。

***

> 📝 **发现错误或有改进建议？** 欢迎提交 [Issue](https://github.com/yeasy/llm_internals/issues) 或 [PR](https://github.com/yeasy/llm_internals/pulls)。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://yeasy.gitbook.io/llm_internals/di-yi-bu-fen-ji-chu-pian/01_introduction/summary.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.