# 9.1 自回归解码：逐词生成的机制

自回归解码是 Transformer 解码器生成文本的基本过程：**每次生成一个词元，然后将它加入已有序列中，作为生成下一个词元的上下文。** 这一过程重复进行，直到生成结束标记（如 `</s>`、`<|end_of_text|>`）或达到最大长度限制。

## 9.1.1 解码流程

以生成“人工智能正在改变世界”为例：

1. 输入起始标记 `<s>`，模型输出词汇表上的概率分布，选择“人工”
2. 输入 `<s> 人工`，模型输出概率分布，选择“智能”
3. 输入 `<s> 人工 智能`，模型输出概率分布，选择“正在”
4. ...以此类推

每一步，模型处理当前的完整序列；由于有 KV 缓存（参见[第十章](/llm_internals/di-san-bu-fen-tui-li-yu-bu-shu-pian/10_inference_optimization/10.2_kv_cache.md)），历史词元的 Key/Value 和层输出不需要从头重算。实际计算集中在新增词元上，但新的 Query 仍要与所有缓存的 Key/Value 计算注意力，因此长上下文下每步成本仍随上下文长度增长。

## 9.1.2 解码的核心问题

在每一步，模型输出一个在整个词汇表（通常数万到数十万词元）上的概率分布。**如何从这个分布中选择下一个词元**是解码策略的核心问题。选择方式的不同直接导致了贪心搜索、束搜索和各种采样策略的区分。

关键权衡在于\*\*质量（选择最可能的词元）**与**多样性（探索更多可能性）\*\*之间的平衡。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://yeasy.gitbook.io/llm_internals/di-san-bu-fen-tui-li-yu-bu-shu-pian/09_decoding/9.1_autoregressive_decode.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.