# 7.5 推理预算与思考过程管理

Extended Thinking 提升准确度但也增加成本，需要精心的预算管理。本节介绍四种思考策略（禁用、自适应、预算控制、强制）、复杂度评估方法、思考质量分析，以及会话级的预算跟踪机制。

## 7.5.1 推理预算的意义

Adaptive Thinking（自适应思考）是 Claude 的现代特性，模型在“思考”中投入 token 进行深度推理，显著提升复杂任务的准确度。但思考本身有成本：

> **更新**: Extended Thinking 的手动模式（`type: "enabled"`）已在 Claude Opus 4.7 弃用，Adaptive Thinking（`type: "adaptive"`）是官方推荐方案。

| 指标              | 说明                   | 影响                |
| --------------- | -------------------- | ----------------- |
| **思考 Token 成本** | 与输出 Token 同价（按输出价计费） | 大量思考会增加成本         |
| **推理时间**        | 思考通常需要额外的推理步骤        | 延长响应时间            |
| **质量收益**        | 复杂任务准确度提升 30-50%     | 不是所有任务都需要         |
| **可预测性**        | 思考深度难以精确控制           | budget\_tokens 限制 |

## 7.5.2 核心概念

推理预算的核心数据结构定义如下：

```python
from enum import Enum
from dataclasses import dataclass
from typing import Optional

class ThinkingStrategy(Enum):
    """思考策略"""
    DISABLED = "disabled"  # 不使用思考
    ADAPTIVE = "adaptive"  # 自适应(模型决定)
    REQUIRED = "required"  # 强制思考
    BUDGET_BASED = "budget_based"  # 基于预算的条件思考

@dataclass
class ReasoningBudget:
    """推理预算配置"""
    strategy: ThinkingStrategy
    max_thinking_tokens: Optional[int] = None  # 单次最大思考 token
    max_thinking_per_session: Optional[int] = None  # 会话总思考 token
    budget_threshold: Optional[float] = None  # 成本阈值,超过则不用思考
    adaptive_threshold: Optional[float] = None  # 自适应触发阈值
    task_complexity_threshold: Optional[str] = None  # 复杂度阈值

@dataclass
class ThinkingResult:
    """思考结果"""
    thinking_tokens: int
    thinking_content: str
    output_tokens: int
    output_content: str
    total_cost: float
    thinking_ratio: float  # 思考 token / 总 token
```

### 策略 1: 完全禁用思考

最经济的选择，适合简单任务：

```python
class NoThinkingProvider:
    """不使用思考的提供者"""

    def __init__(self, config):
        self.config = config

    def complete(self, messages, task_type: str = "generic"):
        """不启用 Extended Thinking"""
        response = self.client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            messages=messages,
            # 注意:没有 thinking 参数
        )
        return ThinkingResult(
            thinking_tokens=0,
            thinking_content="",
            output_tokens=response.usage.output_tokens,
            output_content=response.content[0].text,
            total_cost=0,
            thinking_ratio=0.0
        )
```

### 策略 2: 自适应思考

让模型自主决定是否思考，适合混合工作负载：

```python
import anthropic

class AdaptiveThinkingProvider:
    """自适应思考提供者"""

    def __init__(self, config: ReasoningBudget):
        self.config = config
        self.client = anthropic.Anthropic()

    def complete(self, messages, task_type: str = "generic"):
        """启用自适应思考,模型自主决定"""
        # 自适应模式:模型根据任务复杂度自主决定是否思考
        # 在 Claude 4.7+,自适应模式自动管理思考深度
        response = self.client.messages.create(
            model="claude-opus-4-7",
            max_tokens=16000,
            thinking={
                "type": "adaptive",  # 推荐方式:模型自主决策思考深度
            },
            messages=messages,
        )

        # 解析响应中的思考块
        thinking_content = ""
        output_content = ""
        thinking_tokens = 0
        output_tokens = 0

        for block in response.content:
            if block.type == "thinking":
                thinking_content = block.thinking
                # 从 usage 中获取思考 token 数
            elif block.type == "text":
                output_content = block.text

        # 通过 response.usage 获取准确的 token 统计
        thinking_tokens = getattr(response.usage, "thinking_tokens", 0)
        output_tokens = response.usage.output_tokens

        total_cost = self._calculate_cost(
            thinking_tokens,
            output_tokens
        )

        return ThinkingResult(
            thinking_tokens=thinking_tokens,
            thinking_content=thinking_content,
            output_tokens=output_tokens,
            output_content=output_content,
            total_cost=total_cost,
            thinking_ratio=thinking_tokens / (thinking_tokens + output_tokens)
            if (thinking_tokens + output_tokens) > 0 else 0.0
        )

    def _calculate_cost(self, thinking_tokens: int, output_tokens: int) -> float:
        """计算思考+输出的总成本(此示例基于 Opus 4.7 输出价 $25/1M;若改用 Sonnet 4.6 则为 $15/1M,Haiku 4.5 为 $5/1M)"""
        # 思考 token 与输出 token 同价,Opus 4.7 输出为 $25/1M
        thinking_cost = thinking_tokens * 0.025 / 1000  # $25/1M(与输出同价)
        output_cost = output_tokens * 0.025 / 1000  # $25/1M
        return thinking_cost + output_cost
```

### 策略 3: 基于预算的条件思考

根据成本和任务复杂度动态决定：

```python
class BudgetBasedThinkingProvider:
    """基于预算的思考提供者"""

    def __init__(self, config: ReasoningBudget):
        self.config = config
        self.client = anthropic.Anthropic()
        self.session_thinking_tokens = 0
        self.session_cost = 0.0

    def complete(self, messages, task_type: str = "generic"):
        """根据预算和任务复杂度决定是否思考"""

        # 评估任务复杂度
        complexity = self._assess_complexity(messages, task_type)

        # 检查成本预算
        estimated_thinking_cost = self._estimate_cost(complexity)
        can_afford = (
            self.session_cost + estimated_thinking_cost <
            self.config.budget_threshold
        )

        # 检查 token 预算
        can_afford_tokens = (
            self.session_thinking_tokens +
            (self.config.max_thinking_tokens or 10000) <
            (self.config.max_thinking_per_session or 100000)
        )

        should_think = (
            complexity in ["hard", "very_hard"] and
            can_afford and
            can_afford_tokens
        )

        if should_think:
            return self._complete_with_thinking(messages)
        else:
            return self._complete_without_thinking(messages)

    def _assess_complexity(self, messages: list, task_type: str) -> str:
        """评估任务复杂度"""
        # 基于消息长度、任务类型等因素
        total_tokens = sum(
            len(msg.get("content", "").split()) * 1.3
            for msg in messages
        )

        type_complexity = {
            "reasoning": "hard",
            "coding": "hard",
            "analysis": "medium",
            "summarization": "easy",
            "generic": "medium",
        }

        base_complexity = type_complexity.get(task_type, "medium")

        if total_tokens > 5000:
            return "very_hard" if base_complexity == "hard" else base_complexity
        elif total_tokens > 2000:
            return base_complexity
        else:
            return "easy"

    def _estimate_cost(self, complexity: str) -> float:
        """估算思考成本"""
        complexity_to_tokens = {
            "easy": 2000,
            "medium": 5000,
            "hard": 10000,
            "very_hard": 15000,
        }
        tokens = complexity_to_tokens.get(complexity, 5000)
        return tokens * 0.025 / 1000  # 思考 token 成本(与 Opus 4.7 输出同价 $25/1M)

    def _complete_with_thinking(self, messages):
        """带思考的完成"""
        # 注:在 Claude 4.7+ 使用 type:"adaptive",旧模型可用 type:"enabled" with budget_tokens
        response = self.client.messages.create(
            model="claude-opus-4-7",  # Opus 4.6+ 推荐使用 adaptive 模式
            max_tokens=16000,
            thinking={
                "type": "adaptive",  # 推荐改用 adaptive 而非已弃用的 enabled
            },
            messages=messages,
        )
        # 解析思考块和输出
        thinking_content = ""
        output_content = ""
        for block in response.content:
            if block.type == "thinking":
                thinking_content = block.thinking
            elif block.type == "text":
                output_content = block.text

        thinking_tokens = getattr(response.usage, "thinking_tokens", 0)
        output_tokens = response.usage.output_tokens

        return ThinkingResult(
            thinking_tokens=thinking_tokens,
            thinking_content=thinking_content,
            output_tokens=output_tokens,
            output_content=output_content,
            total_cost=0.0,
            thinking_ratio=thinking_tokens / (thinking_tokens + output_tokens)
            if (thinking_tokens + output_tokens) > 0 else 0.0,
        )

    def _complete_without_thinking(self, messages):
        """不使用思考的完成"""
        response = self.client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            messages=messages,
        )
        output_content = ""
        for block in response.content:
            if block.type == "text":
                output_content = block.text

        return ThinkingResult(
            thinking_tokens=0,
            thinking_content="",
            output_tokens=response.usage.output_tokens,
            output_content=output_content,
            total_cost=0.0,
            thinking_ratio=0.0,
        )
```

### 策略 4: 强制思考

某些任务（如代码审查、安全决策）必须激活思考：

```python
class RequiredThinkingProvider:
    """强制思考提供者"""

    CRITICAL_TASK_TYPES = {
        "security_decision",
        "financial_transaction",
        "code_review",
        "medical_advice",
    }

    def __init__(self, config: ReasoningBudget):
        self.config = config
        self.client = anthropic.Anthropic()

    def complete(self, messages, task_type: str = "generic"):
        """对关键任务强制启用思考"""
        if task_type not in self.CRITICAL_TASK_TYPES:
            raise ValueError(
                f"任务类型 {task_type} 不需要强制思考"
            )

        # 构建强制思考的提示
        enhanced_messages = self._enhance_for_deep_thinking(messages)

        response = self.client.messages.create(
            model="claude-opus-4-7",
            max_tokens=16000,
            thinking={
                "type": "adaptive",  # adaptive 自动管理思考深度,无需显式 budget_tokens
            },
            messages=enhanced_messages,
        )

        # ... 验证思考深度
        thinking_content = ""
        for block in response.content:
            if block.type == "thinking":
                thinking_content = block.thinking
                break

        if not thinking_content or len(thinking_content) < 500:
            raise ValueError(
                "思考内容不足,可能是关键任务的思考不够深入"
            )

        # 解析输出内容
        output_content = ""
        for block in response.content:
            if block.type == "text":
                output_content = block.text
                break

        thinking_tokens = getattr(response.usage, "thinking_tokens", 0)
        output_tokens = response.usage.output_tokens

        return ThinkingResult(
            thinking_tokens=thinking_tokens,
            thinking_content=thinking_content,
            output_tokens=output_tokens,
            output_content=output_content,
            total_cost=0.0,
            thinking_ratio=thinking_tokens / (thinking_tokens + output_tokens)
            if (thinking_tokens + output_tokens) > 0 else 0.0,
        )

    def _enhance_for_deep_thinking(self, messages):
        """增强消息以促进更深的思考"""
        enhanced = messages.copy()
        enhanced[-1]["content"] += (
            "\n\n请花时间仔细思考这个问题的所有方面,"
            "包括潜在的风险和边界情况。"
        )
        return enhanced
```

### 思考结果分析

对思考过程进行质量分析的实现方式如下：

```python
class ThinkingAnalyzer:
    """思考过程分析"""

    def analyze_thinking_quality(self, result: ThinkingResult) -> dict:
        """分析思考质量和成本效率"""

        analysis = {
            "thinking_ratio": result.thinking_ratio,
            "cost_efficiency": self._calculate_efficiency(result),
            "thinking_depth": self._assess_depth(result.thinking_content),
            "quality_indicators": self._extract_quality_indicators(result),
        }

        return analysis

    def _calculate_efficiency(self, result: ThinkingResult) -> float:
        """计算成本效率(质量/成本)"""
        if result.total_cost == 0:
            return 0
        # 假设有一个质量评分
        quality_score = self._estimate_quality(result.output_content)
        return quality_score / result.total_cost

    def _assess_depth(self, thinking_content: str) -> str:
        """评估思考深度"""
        if not thinking_content:
            return "none"
        if len(thinking_content) < 500:
            return "shallow"
        elif len(thinking_content) < 2000:
            return "moderate"
        else:
            return "deep"

    def _extract_quality_indicators(self, result: ThinkingResult) -> dict:
        """从思考过程中提取质量指标"""
        thinking = result.thinking_content

        indicators = {
            "considers_alternatives": "另一种" in thinking or "也可以" in thinking,
            "identifies_constraints": "限制" in thinking or "约束" in thinking,
            "checks_edge_cases": "边界" in thinking or "特殊情况" in thinking,
            "shows_uncertainty": "可能" in thinking or "不确定" in thinking,
        }

        return indicators

    def should_retry_with_more_thinking(self, result: ThinkingResult) -> bool:
        """判断是否应该用更多思考重试"""
        # 如果输出质量低且思考比例低,可以重试
        quality = self._estimate_quality(result.output_content)
        if quality < 0.5 and result.thinking_ratio < 0.3:
            return True
        return False
```

### 推理预算管理器

推理预算管理器的完整实现如下：

```python
class ReasoningBudgetManager:
    """推理预算管理"""

    def __init__(self, config: ReasoningBudget):
        self.config = config
        self.session_stats = {
            "total_thinking_tokens": 0,
            "total_cost": 0.0,
            "request_count": 0,
            "avg_thinking_ratio": 0.0,
        }

    def should_enable_thinking(
        self,
        task_type: str,
        estimated_complexity: str
    ) -> bool:
        """决定是否启用思考"""
        if self.config.strategy == ThinkingStrategy.DISABLED:
            return False
        elif self.config.strategy == ThinkingStrategy.REQUIRED:
            return True
        elif self.config.strategy == ThinkingStrategy.ADAPTIVE:
            return estimated_complexity in ["hard", "very_hard"]
        elif self.config.strategy == ThinkingStrategy.BUDGET_BASED:
            # 检查成本预算
            can_afford = (
                self.session_stats["total_cost"] < self.config.budget_threshold
            )
            return can_afford and estimated_complexity != "easy"
        return False

    def record_thinking_usage(self, result: ThinkingResult):
        """记录思考使用情况"""
        self.session_stats["total_thinking_tokens"] += result.thinking_tokens
        self.session_stats["total_cost"] += result.total_cost
        self.session_stats["request_count"] += 1

        # 更新平均思考比例
        prev_avg = self.session_stats["avg_thinking_ratio"]
        n = self.session_stats["request_count"]
        self.session_stats["avg_thinking_ratio"] = (
            (prev_avg * (n - 1) + result.thinking_ratio) / n
        )

    def get_budget_status(self) -> dict:
        """获取预算使用状态"""
        return {
            "used": self.session_stats["total_cost"],
            "limit": self.config.budget_threshold,
            "remaining": (
                self.config.budget_threshold -
                self.session_stats["total_cost"]
            ),
            "requests": self.session_stats["request_count"],
            "avg_thinking_ratio": self.session_stats["avg_thinking_ratio"],
        }
```

### 使用示例

推理预算管理器的使用示例如下：

```python
# 配置基于预算的思考
config = ReasoningBudget(
    strategy=ThinkingStrategy.BUDGET_BASED,
    max_thinking_tokens=15000,
    max_thinking_per_session=100000,
    budget_threshold=0.50,  # 最多 $0.50
    task_complexity_threshold="medium"
)

manager = ReasoningBudgetManager(config)
provider = BudgetBasedThinkingProvider(config)

# 处理任务
result = provider.complete(messages, task_type="reasoning")

# 分析结果
analyzer = ThinkingAnalyzer()
analysis = analyzer.analyze_thinking_quality(result)
print(f"思考深度: {analysis['thinking_depth']}")
print(f"成本效率: {analysis['cost_efficiency']:.4f}")

# 记录使用
manager.record_thinking_usage(result)
status = manager.get_budget_status()
print(f"预算使用: ${status['used']:.4f} / ${status['limit']:.4f}")
```

### 总结

推理预算管理通过：

* **多种策略** （禁用、自适应、预算、强制）满足不同需求
* **动态复杂度评估** 决定思考投入
* **成本和质量平衡** 优化成本效益
* **会话级预算跟踪** 防止失控支出

这是生产智能体系统必不可少的能力，尤其是当扩展思考成为标配时。

## 附注：Extended Thinking 的弃用

Claude 官方已弃用手动 Extended Thinking 配置（`thinking: {type: "enabled", budget_tokens: N}`）：

| 模型                         | 状态                |
| -------------------------- | ----------------- |
| Claude Opus 4.7+           | ❌ 不支持，返回 400 错误   |
| Claude Opus 4.6、Sonnet 4.6 | ⚠️ 已弃用但功能正常（计划迁移） |
| 早期模型（4.5及更早）               | ✅ 仍支持（若无法升级）      |

**迁移指南**: 使用 `thinking: {type: "adaptive"}` 替代。通过 `output_config.effort` 参数控制思考深度（`max`, `xhigh`, `high`, `medium`, `low`），而非 `budget_tokens`。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://yeasy.gitbook.io/harness_engineering_guide/di-er-bu-fen-harness-he-xin-zi-xi-tong/07_model_integration/7.5_reasoning_budget.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.