> For the complete documentation index, see [llms.txt](https://yeasy.gitbook.io/harness_engineering_guide/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://yeasy.gitbook.io/harness_engineering_guide/di-si-bu-fen-an-quan-ping-gu-yu-yan-jin/13_evaluation/13.5_miniharness_testing.md).

# 13.5 实战：MiniHarness 完整测试

本节为 MiniHarness 构建金字塔型测试套件：当前仓库已经落地单元测试与集成测试，E2E 和性能测试作为后续扩展层。

> 完整代码见 `lab/tests/`，本节聚焦测试策略与关键用例。

## 13.5.1 测试框架与固件

pytest 固件提供隔离的测试环境与可复用数据。下面是目标形态示意；当前 Lab 的真实共享固件见 `lab/conftest.py` 中的 `tmp_dir` 和 `sample_tool_schema`：

```python
@pytest.fixture
def test_data_dir():
    """临时目录(自动清理)"""
    with tempfile.TemporaryDirectory() as tmpdir:
        yield tmpdir

@pytest.fixture
def miniharness(test_data_dir):
    """Harness 实例"""
    return SecureMiniHarness(base_path=test_data_dir)
```

实际仓库中的完整测试以 `lab/tests/unit/` 为主，E2E 和更高层封装可按上面的目标形态继续扩展。

## 13.5.2 单元测试：路径校验与命令检测

单元测试测试最小的可测试单元（函数/类）。路径校验的关键用例：

```python
class TestPathValidator:
    def test_path_traversal_attacks(self, test_data_dir):
        """验证五层防护能阻止各种编码方式的穿越"""
        validator = PathValidator(base_path=test_data_dir)

        attacks = [
            "../../../etc/passwd",      # 基本穿越
            "..%2f..%2fetc%2fpasswd",  # URL 编码
            "..%252f..%252fetc",       # 双重编码
        ]

        for attack in attacks:
            with pytest.raises(ValueError):
                validator.validate(attack)
```

命令检测的关键用例：

```python
class TestDangerousCommandDetector:
    def test_piped_dangerous_commands(self):
        """危险命令可能隐藏在管道中"""
        detector = DangerousCommandDetector()

        # ls 是安全的,但管道后的 rm 是危险的
        assert detector.detect("ls | rm -rf /") is True
```

关键原则：**多个小的、独立的测试**，而非一个巨大的测试。

完整实现参见 `lab/tests/unit/test_security.py` 与 `lab/tests/unit/test_tools.py`。

## 13.5.3 集成测试：权限与护栏流程

集成测试测试多个组件的协作。权限 + 护栏的集成：

```python
@pytest.mark.asyncio
async def test_permission_and_guardrail_integration(miniharness):
    """权限决策 → 护栏检查 → 执行的完整流程"""

    # 危险命令被阻止(护栏层)
    with pytest.raises(PermissionError):
        await miniharness.execute_tool(
            tool_name='bash_exec',
            args={'command': 'rm -rf /'},
            user_id='user1'
        )

    # 安全命令被允许
    result = await miniharness.execute_tool(
        tool_name='bash_exec',
        args={'command': 'echo test'},
        user_id='user1'
    )
    assert result['status'] == 'success'
```

对应的真实实现已拆到 `lab/tests/unit/test_security.py`、`lab/tests/unit/test_tools.py` 与 `lab/tests/unit/test_core.py`；本段展示的是集成层目标用例。

## 13.5.4 端到端测试：多步工作流

E2E 测试测试真实用户工作流。文件分析流程：

```python
@pytest.mark.asyncio
async def test_file_read_and_analyze_workflow(miniharness, sample_files):
    """读取文件 → 分析 → 返回结果"""

    # 步骤1:读取文件
    content = await miniharness.read_file(sample_files['text'])

    # 步骤2:调用分析工具
    result = await miniharness.analyze_text(content)

    # 步骤3:验证结果
    assert 'summary' in result
```

并发工作流：

```python
@pytest.mark.asyncio
async def test_concurrent_file_reads(miniharness, sample_files):
    """多用户并发读取"""

    tasks = [
        miniharness.read_file(sample_files['text'])
        for _ in range(10)
    ]
    results = await asyncio.gather(*tasks)

    assert all(r is not None for r in results)
```

当前仓库尚未提供独立的 E2E 测试文件。落地时可放在 `lab/tests/integration/` 或新建 `lab/tests/e2e/`，并在 CI 中单独标记为慢速用例。

## 13.5.5 性能基准：延迟与吞吐

性能测试衡量系统能力指标。延迟基准：

```python
@pytest.mark.asyncio
async def test_tool_execution_latency(miniharness):
    """工具执行延迟(目标 <100ms)"""

    latencies = []
    for _ in range(100):
        start = time.time()
        await miniharness.execute_tool(tool_name='echo', args={...})
        latencies.append((time.time() - start) * 1000)

    avg = statistics.mean(latencies)
    p99 = sorted(latencies)[99]

    assert avg < 50, f"平均延迟过高: {avg}ms"
    assert p99 < 100, f"P99 延迟过高: {p99}ms"
```

吞吐基准：

```python
@pytest.mark.asyncio
async def test_permission_decision_throughput(miniharness):
    """权限决策吞吐(目标 >1000/s)"""

    start = time.time()
    count = 0

    while time.time() - start < 1.0:
        await miniharness.permission_engine.decide(...)
        count += 1

    assert count > 1000, f"吞吐太低: {count}/s"
```

当前仓库尚未提供独立的性能测试文件。落地时应使用明确的硬件、样本规模和阈值，并与功能测试分开运行，避免让日常 CI 受环境噪声影响。

## 13.5.6 运行与覆盖率

命令：

```bash
# 运行所有测试
pytest tests/ -v

# 按类别运行
pytest tests/unit -v
pytest tests/integration -v

# 生成覆盖率报告(需要安装 pytest-cov)
python -m pip install pytest-cov
pytest tests/unit tests/integration --cov=mini_harness --cov-report=html
```

目标覆盖率：>80%。共享固件参见 `lab/conftest.py`。

## 13.5.7 总结

MiniHarness 测试策略：

| 层级  | 数量  | 速度 | 成本 | 示例        |
| --- | --- | -- | -- | --------- |
| 单元  | 已落地 | 快  | 低  | 路径校验、命令检测 |
| 集成  | 已落地 | 中  | 中  | 权限+护栏流程   |
| E2E | 待扩展 | 慢  | 高  | 完整工作流     |
| 性能  | 待扩展 | 持续 | 中  | 延迟、吞吐基准   |

金字塔形测试：**快速反馈** （单元） → **保证集成** → **真实验证** (E2E) → **性能监控**。