> For the complete documentation index, see [llms.txt](https://yeasy.gitbook.io/openclaw_guide/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://yeasy.gitbook.io/openclaw_guide/di-san-bu-fen-shi-xian-yuan-li-yu-gong-cheng-luo-di/11_reliability_security/11.2_rotation_cooldown.md).

# 11.2 冷却与禁用：故障窗口内的止血机制

本节聚焦 **auth profile 层面**的冷却与禁用机制：当某个 API key 或 OAuth profile 持续报错时，系统如何自动退避并轮换，防止重试放大。模型级的跨模型回退（`fallbacks`）见 [11.3](/openclaw_guide/di-san-bu-fen-shi-xian-yuan-li-yu-gong-cheng-luo-di/11_reliability_security/11.3_fallback_rules.md)。

## 11.2.1 冷却解决什么：防止重试放大

在供应商抖动或限流时，如果系统持续对同一目标重试，会出现放大链条：失败触发重试，重试挤占并发与配额，队列堆积导致端到端超时。

冷却的目标是在故障窗口内降低无效尝试，把资源留给仍可能成功的路径。

## 11.2.2 Auth Profile 冷却：官方分级机制

OpenClaw 在 auth-profiles 层面内置了冷却分级机制。当某个 auth profile（API key 或 OAuth）持续报错时，普通失败走短阶梯冷却，并在 profile 轮转时将冷却中的 profile 移至末位；指数退避主要用于 billing、auth permanent 等禁用窗口。参考：<https://docs.openclaw.ai/concepts/model-failover>

它可以抽象成下面这条状态迁移链：

```mermaid
flowchart TD
  O["正常 profile"] --> F["请求失败"]
  F --> C{"失败次数增加"}
  C -->|"1-2 次"| CD["30s / 60s 短冷却"]
  C -->|"3+ 次"| CAP["5m 封顶冷却"]
  C -->|"billing / auth permanent"| BD["禁用窗口指数退避"]
  CAP --> R["轮转到后位 / 走备用路径"]
  CD --> R["轮转到后位 / 走备用路径"]
  BD --> R
  R --> P["探针或后续请求重试"]
  P -->|"成功"| O
  P -->|"继续失败"| F
```

图 11-3：Auth profile 冷却与禁用的状态迁移

**冷却分级（按失败次数递增）：**

> \[!NOTE] 本轮以本地 OpenClaw 实现为准：当前 checkout 的 auth-profile 短冷却仍是 30 秒、1 分钟、5 分钟封顶；若官方文档或后续版本改为更长梯度，升级前应以目标版本源码和 `models status --probe` 复验。

| 失败次数 | 冷却时长（当前本地实现） |
| ---- | ------------ |
| 1 次  | 30 秒         |
| 2 次  | 1 分钟         |
| 3 次  | 5 分钟         |
| 4+ 次 | 5 分钟封顶       |

当前应把“凭据/profile 持久化”和“运行期 auth 路由状态”分开理解：`auth-profiles.json` 负责保存认证档案本身，而运行期冷却、禁用与路由状态应独立管理，不应再把它们都写成 `auth-profiles.json` 的字段。

```json
{
  "usageStats": {
    "provider:profile": {
      "lastUsed": 1736160000000,
      "cooldownUntil": 1736160600000,
      "errorCount": 2
    }
  }
}
```

**Billing disable 退避（计费错误专项）：**

* 起步 5 小时，每次 billing 失败翻倍，上限 24 小时。
* `disabledReason: "billing"` 单独标记，与普通冷却区分。
* 24 小时内无失败则计数重置。

**可配置字段（`openclaw.json`）：**

```jsonc
{
  auth: {
    cooldowns: {
      billingBackoffHours: 5,           // billing 退避起步时长（小时）
      billingBackoffHoursByProvider: {  // 按供应商覆盖起步时长
        openai: 3,
      },
      billingMaxHours: 24,              // billing 最大禁用时长
      failureWindowHours: 24,           // 无失败后重置计数的窗口
      authPermanentBackoffMinutes: 10,  // 认证永久错误的起步退避
      authPermanentMaxMinutes: 60,      // 认证永久错误的最大退避
      overloadedProfileRotations: 1,    // overloaded 类错误最多切换几个同 provider profile
      overloadedBackoffMs: 0,           // overloaded 回退前的短退避
      rateLimitedProfileRotations: 1,   // rate-limit 类错误最多切换几个同 provider profile
    },
  },
}
```

## 11.2.3 验收与排障

* 如果回退频繁发生但主链路看起来正常，优先检查运行期 auth state 中的 `cooldownUntil`、`errorCount` 与禁用原因，确认是否存在持续触发冷却的 profile。
* 如果 billing disable 持续触发，检查 `disabledReason` 与 `auth.cooldowns.*` 配置，评估是否需要调整退避起步时长。
* 如果故障已恢复但仍长时间走备用链路，检查 `cooldownUntil` 时间戳是否已过期；若已过期仍未恢复，用探针命令主动验证。

操作示例：

```bash
openclaw models status
openclaw models status --probe
openclaw status --deep
```