# 第八章：从预训练到对齐：让模型有用且安全

预训练产出的“基础模型”拥有丰富的语言知识，但它**不知道如何与人类有效交互**——它会无差别地续写任何文本，包括有害内容。从一个能写诗也能写恶意代码的“原始大脑”，变成一个有用、诚实且安全的助手，需要经历**后训练对齐**（Post-training Alignment）的过程。

本章介绍后训练的三个核心阶段——监督微调（SFT）、基于人类反馈的强化学习（RLHF）、直接偏好优化（DPO）——以及参数高效微调技术，解释每种方法的设计动机和各自的优劣。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://yeasy.gitbook.io/llm_internals/di-er-bu-fen-xun-lian-pian/08_alignment.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
