# 10.5 剪枝与知识蒸馏：模型瘦身的两条路

## 10.5.1 剪枝

**剪枝**（Pruning）移除模型中对输出贡献小的参数或结构。**非结构化剪枝**将单个权重置零，**结构化剪枝**移除整个注意力头或 FFN 的通道。结构化剪枝对硬件更友好，但精度损失可能更大。

SparseGPT 等方法可以在不重新训练的情况下对大语言模型进行 50% 以上的非结构化剪枝，精度损失极小。需要注意的是，非结构化稀疏首先是压缩和质量保持结果；若没有稀疏格式、稀疏 kernel 或硬件友好的稀疏模式（如结构化/半结构化稀疏）配合，GPU 上的 wall-clock 延迟未必会明显下降。

## 10.5.2 知识蒸馏

**知识蒸馏**（Knowledge Distillation）用大模型（教师）的输出概率分布来训练小模型（学生）。学生模型不仅学习正确答案，还学习教师模型在错误选项上的概率分布——这些“暗知识”（Dark Knowledge）包含了丰富的类间关系信息。

形式化地，蒸馏的关键是**温度软化**（Hinton 等人，2015）。把教师与学生的 logits 都除以温度 $$T>1$$ 再做 Softmax，可以放大那些被压平的小概率信息（即暗知识）。蒸馏损失通常是软标签与硬标签的加权和：

$$\mathcal{L}*{\text{KD}} = (1-\alpha),\underbrace{\text{CE}(y,, p\_S)}*{\text{硬标签}} + \alpha, T^2,\underbrace{\text{KL}!\left(p\_T^{(T)} ,|, p\_S^{(T)}\right)}\_{\text{软标签}}$$

其中 $$p^{(T)}$$ 表示温度为 $$T$$ 的软化分布；$$T$$ 越大分布越平滑、类间关系越突出，而 $$T^2$$ 用于补偿软化对梯度尺度的缩放。按蒸馏信号的来源，方法可分为几类：

* **响应蒸馏**（response-based）：匹配教师的输出分布（如上式），最常用；
* **特征蒸馏**（feature-based）：额外匹配中间层的隐状态或注意力图，传递更细的结构信息；
* **序列级蒸馏**（sequence-level）：对生成式模型，让学生匹配教师生成的整条序列，或在学生自己采样的轨迹上用教师打分（**在线 / on-policy 蒸馏**），以缓解逐词元蒸馏的曝光偏差。

DistilBERT 是知识蒸馏的经典案例：用 BERT-Base 蒸馏出参数量减少 40%、速度提升 60% 的小模型，同时保留了 97% 的性能。

值得强调的是，蒸馏早已不只是“压缩已有模型”的事后手段，而成为**预训练范式**本身的一部分：当前许多前沿小模型（如 Gemma、Llama 系的轻量版本）直接以更强的大模型作为教师信号进行预训练，使小模型以远低于从零训练的成本逼近大模型质量——这也呼应了本书的判断：蒸馏正在快速压缩“模型领先”的时间护城河。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://yeasy.gitbook.io/llm_internals/di-san-bu-fen-tui-li-yu-bu-shu-pian/10_inference_optimization/10.5_pruning_distillation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.