# 第一部分：基础篇

- [第一章：从序列建模到 Transformer](/llm_internals/di-yi-bu-fen-ji-chu-pian/01_introduction.md)
- [1.1 序列建模的根本挑战](/llm_internals/di-yi-bu-fen-ji-chu-pian/01_introduction/1.1_seq_challenge.md)
- [1.2 RNN 与 CNN：成就与瓶颈](/llm_internals/di-yi-bu-fen-ji-chu-pian/01_introduction/1.2_rnn_cnn_limits.md)
- [1.3 注意力的诞生：让模型学会“看哪里”](/llm_internals/di-yi-bu-fen-ji-chu-pian/01_introduction/1.3_attention_birth.md)
- [1.4 Transformer 的提出与核心思想](/llm_internals/di-yi-bu-fen-ji-chu-pian/01_introduction/1.4_transformer_idea.md)
- [1.5 里程碑时刻：从学术论文到产业变革](/llm_internals/di-yi-bu-fen-ji-chu-pian/01_introduction/1.5_milestones.md)
- [本章小结](/llm_internals/di-yi-bu-fen-ji-chu-pian/01_introduction/summary.md)
- [第二章：注意力机制：为什么它是核心](/llm_internals/di-yi-bu-fen-ji-chu-pian/02_attention.md)
- [2.1 查询-键-值：一种信息检索的直觉](/llm_internals/di-yi-bu-fen-ji-chu-pian/02_attention/2.1_qkv_intuition.md)
- [2.2 缩放点积注意力：为什么要除以根号 d](/llm_internals/di-yi-bu-fen-ji-chu-pian/02_attention/2.2_scaled_dot_product.md)
- [2.3 多头注意力：为什么多个子空间更好](/llm_internals/di-yi-bu-fen-ji-chu-pian/02_attention/2.3_multi_head.md)
- [2.4 自注意力、交叉注意力与因果掩码](/llm_internals/di-yi-bu-fen-ji-chu-pian/02_attention/2.4_self_cross_causal.md)
- [2.5 注意力的代价：复杂度与局限](/llm_internals/di-yi-bu-fen-ji-chu-pian/02_attention/2.5_complexity_limits.md)
- [本章小结](/llm_internals/di-yi-bu-fen-ji-chu-pian/02_attention/summary.md)
- [第三章：Transformer 核心组件解析](/llm_internals/di-yi-bu-fen-ji-chu-pian/03_components.md)
- [3.1 分词：从文本到词元](/llm_internals/di-yi-bu-fen-ji-chu-pian/03_components/3.1_tokenization.md)
- [3.2 词嵌入：从离散符号到连续向量](/llm_internals/di-yi-bu-fen-ji-chu-pian/03_components/3.2_embedding.md)
- [3.3 位置编码：为什么顺序信息必须显式注入](/llm_internals/di-yi-bu-fen-ji-chu-pian/03_components/3.3_position_encoding.md)
- [3.4 前馈网络：Transformer 的“记忆层”](/llm_internals/di-yi-bu-fen-ji-chu-pian/03_components/3.4_feedforward.md)
- [3.5 残差连接：梯度为什么能流过百层网络](/llm_internals/di-yi-bu-fen-ji-chu-pian/03_components/3.5_residual.md)
- [3.6 层归一化：为什么选择 LayerNorm 而非 BatchNorm](/llm_internals/di-yi-bu-fen-ji-chu-pian/03_components/3.6_layer_norm.md)
- [3.7 编码器-解码器：完整架构如何协同工作](/llm_internals/di-yi-bu-fen-ji-chu-pian/03_components/3.7_full_architecture.md)
- [本章小结](/llm_internals/di-yi-bu-fen-ji-chu-pian/03_components/summary.md)
- [第四章：位置编码的设计哲学](/llm_internals/di-yi-bu-fen-ji-chu-pian/04_position_encoding.md)
- [4.1 正弦位置编码：频率与外推的直觉](/llm_internals/di-yi-bu-fen-ji-chu-pian/04_position_encoding/4.1_sinusoidal.md)
- [4.2 可学习位置编码：灵活性与局限](/llm_internals/di-yi-bu-fen-ji-chu-pian/04_position_encoding/4.2_learnable.md)
- [4.3 旋转位置编码：为什么旋转能编码相对位置](/llm_internals/di-yi-bu-fen-ji-chu-pian/04_position_encoding/4.3_rope.md)
- [4.4 ALiBi 与其他相对位置方案](/llm_internals/di-yi-bu-fen-ji-chu-pian/04_position_encoding/4.4_alibi_others.md)
- [本章小结](/llm_internals/di-yi-bu-fen-ji-chu-pian/04_position_encoding/summary.md)
