# A.3 主流模型参数速查表

| 模型                | 发布时间       | 架构类型    | 参数量         | 层数  | 隐藏维度     | 注意力头 | 上下文长度                       | 关键特性                                                                            |
| ----------------- | ---------- | ------- | ----------- | --- | -------- | ---- | --------------------------- | ------------------------------------------------------------------------------- |
| Transformer       | 2017.06    | Enc-Dec | 65M/213M    | 6   | 512/1024 | 8/16 | -                           | 原始架构                                                                            |
| BERT-Base         | 2018.10    | Encoder | 110M        | 12  | 768      | 12   | 512                         | MLM + NSP                                                                       |
| BERT-Large        | 2018.10    | Encoder | 340M        | 24  | 1024     | 16   | 512                         | MLM + NSP                                                                       |
| GPT-2             | 2019.02    | Decoder | 1.5B        | 48  | 1600     | 25   | 1024                        | 自回归 LM                                                                          |
| GPT-3             | 2020.06    | Decoder | 175B        | 96  | 12288    | 96   | 2048                        | 少样本学习                                                                           |
| T5-Large          | 2020.10    | Enc-Dec | 770M        | 24  | 1024     | 16   | 512                         | 文本到文本                                                                           |
| Llama 2-7B        | 2023.07    | Decoder | 7B          | 32  | 4096     | 32   | 4096                        | RoPE + GQA                                                                      |
| Llama 2-70B       | 2023.07    | Decoder | 70B         | 80  | 8192     | 64   | 4096                        | GQA(8KV头)                                                                       |
| Llama 3-8B        | 2024.04    | Decoder | 8B          | 32  | 4096     | 32   | 8192                        | 128K词汇表                                                                         |
| Llama 3-70B       | 2024.04    | Decoder | 70B         | 80  | 8192     | 64   | 8192                        | GQA                                                                             |
| Llama 3.1-405B    | 2024.07    | Decoder | 405B        | 126 | 16384    | 128  | 128K                        | 开源追平 GPT-4                                                                      |
| GPT-4o mini       | 2024.07    | Decoder | 未公开         | -   | -        | -    | 128K                        | 极致性价比                                                                           |
| Claude 3.5 Sonnet | 2024.06    | Decoder | 未公开         | -   | -        | -    | 200K                        | Artifacts/Computer Use                                                          |
| o1                | 2024.12    | Decoder | 未公开         | -   | -        | -    | 200K                        | 推理时计算扩展                                                                         |
| o1-mini           | 2024.12    | Decoder | 未公开         | -   | -        | -    | 200K                        | 轻量级推理                                                                           |
| Qwen 2.5-72B      | 2024.09    | Decoder | 72B         | 80  | 8192     | 64   | 128K                        | 多语言、代码、数学                                                                       |
| DeepSeek-R1       | 2025.01.20 | MoE-Dec | 671B(37B激活) | 61  | 7168     | 128  | 128K                        | cold-start + 多阶段训练                                                              |
| Mistral 7B        | 2023.09    | Decoder | 7B          | 32  | 4096     | 32   | 32K                         | 滑动窗口注意力                                                                         |
| DeepSeek-V3       | 2024.12    | MoE-Dec | 671B(37B激活) | 61  | 7168     | 128  | 128K                        | MoE + FP8                                                                       |
| Claude 3.7 Sonnet | 2025.02.24 | Decoder | 未公开         | -   | -        | -    | 200K                        | 混合推理能力                                                                          |
| Claude Opus 4     | 2025.05    | Decoder | 未公开         | -   | -        | -    | 200K                        | 多模态和代理能力                                                                        |
| Claude Sonnet 4.5 | 2025.09.29 | Decoder | 未公开         | -   | -        | -    | 200K                        | 高性能推理                                                                           |
| Claude Haiku 4.5  | 2025.10.15 | Decoder | 未公开         | -   | -        | -    | 200K                        | 快速轻量级                                                                           |
| Gemini 2.5 Pro    | 2025.03.25 | 多模态     | 未公开         | -   | -        | -    | 1M                          | 原生多模态                                                                           |
| Gemini 3 Pro      | 2025.11.18 | 多模态     | 未公开         | -   | -        | -    | 1M                          | 原生多模态                                                                           |
| Gemini 3.1 Pro    | 2026.02.19 | 多模态     | 未公开         | -   | -        | -    | 1M                          | 原生多模态                                                                           |
| o3                | 2025.04    | Decoder | 未公开         | -   | -        | -    | 200K                        | 推理模型                                                                            |
| Claude Opus 4.6   | 2026.02.05 | Decoder | 未公开         | -   | -        | -    | 1M                          | 增强推理                                                                            |
| Claude Sonnet 4.6 | 2026.02.17 | Decoder | 未公开         | -   | -        | -    | 1M                          | 长上下文能力                                                                          |
| Claude Opus 4.7   | 2026.04.16 | Decoder | 未公开         | -   | -        | -    | 1M                          | 软件工程能力（SWE-bench 87.6%，GPQA 94.2%，Terminal-Bench 2.0 69.4%，Finance Agent 64.4%） |
| Llama 4           | 2025.04    | MoE-Dec | 未公开         | -   | -        | -    | 1M-10M（因 Scout/Maverick 不同） | MoE 架构                                                                          |
| GPT-5             | 2025.08.07 | Decoder | 未公开         | -   | -        | -    | 未公开                         | 多模态推理                                                                           |
| GPT-5.1           | 2025.11    | Decoder | 未公开         | -   | -        | -    | 未公开                         | 迭代更新                                                                            |
| GPT-5.2           | 2025.12.11 | Decoder | 未公开         | -   | -        | -    | 未公开                         | 旗舰推理模型                                                                          |
| GPT-5.3           | 2026.02.05 | Decoder | 未公开         | -   | -        | -    | 未公开                         | Codex 融合                                                                        |
| GPT-5.4           | 2026.03.05 | Decoder | 未公开         | -   | -        | -    | 未公开                         | 融合推理与编码                                                                         |
| GPT-5.4 mini      | 2026.03    | Decoder | 未公开         | -   | -        | -    | 未公开                         | 极致性价比                                                                           |
| GPT-5.4 nano      | 2026.03    | Decoder | 未公开         | -   | -        | -    | 未公开                         | 超轻量级                                                                            |
| GPT-5.5           | 2026.04.23 | Decoder | 未公开         | -   | -        | -    | 未公开                         | 最新旗舰（$5/$30）                                                                    |

图 A-1：主流 Transformer 模型参数速查表

## 关键缩写说明

* **Enc-Dec**：编码器-解码器架构
* **MoE-Dec**：混合专家解码器架构
* **MLM**：掩码语言模型
* **NSP**：下一句预测
* **GQA**：分组查询注意力
* **RoPE**：旋转位置编码

## 使用提示

* 表中参数与上下文长度优先用于快速建立数量级直觉；遇到版本迭代较快的闭源模型时，应再结合正文中的时间线说明一起阅读。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://yeasy.gitbook.io/llm_internals/di-si-bu-fen-qian-yan-yu-shi-jian-pian/appendix/a3_model_reference.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.