# 附录 C：参考文献

本附录列出本书编写过程中参考的主要资料。

## 标准与框架

1. OWASP. *OWASP Top 10 for LLM Applications（项目主页）*. [OWASP](https://genai.owasp.org/)
2. NIST. (2023). *Artificial Intelligence Risk Management Framework (AI RMF 1.0)*. [NIST](https://www.nist.gov/itl/ai-risk-management-framework)
3. European Union. (2024). *Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI Act)*. [EUR-Lex](https://eur-lex.europa.eu/eli/reg/2024/1689/oj)
4. MITRE. (2024). *ATLAS (Adversarial Threat Landscape for AI Systems)*. [MITRE ATLAS](https://atlas.mitre.org/)

## 研究论文

### 安全对齐

5. Ouyang, L., et al. (2022). *Training language models to follow instructions with human feedback*. NeurIPS 2022.
6. Bai, Y., et al. (2022). *Constitutional AI: Harmlessness from AI Feedback*. arXiv preprint.
7. Rafailov, R., et al. (2023). *Direct Preference Optimization: Your Language Model is Secretly a Reward Model*. NeurIPS 2023.

### 攻击技术

8. Schulhoff, S., Pinto, J., Khan, A., et al. (2023). *Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition*. EMNLP 2023. [ACL Anthology](https://aclanthology.org/2023.emnlp-main.302/)
9. Greshake, K., et al. (2023). *Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection*. AISec 2023.
10. Zou, A., et al. (2023). *Universal and Transferable Adversarial Attacks on Aligned Language Models*. arXiv preprint.
11. Wei, A., et al. (2023). *Jailbroken: How Does LLM Safety Training Fail?*. NeurIPS 2023.

### 防御技术

12. Guo, C., et al. (2026). *IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs*. arXiv preprint. [arXiv:2603.10521](https://arxiv.org/abs/2603.10521)

### 隐私与数据安全

13. Carlini, N., et al. (2021). *Extracting Training Data from Large Language Models*. USENIX Security 2021.
14. Carlini, N., et al. (2023). *Quantifying Memorization Across Neural Language Models*. ICLR 2023.
15. Nasr, M., et al. (2023). *Scalable Extraction of Training Data from (Production) Language Models*. arXiv preprint.

### 模型安全

16. Goldblum, M., et al. (2022). *Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses*. IEEE TPAMI.
17. Tramèr, F., et al. (2016). *Stealing Machine Learning Models via Prediction APIs*. USENIX Security 2016.

## 技术报告与白皮书

18. OpenAI. (2023). *GPT-4 System Card*. OpenAI Technical Report.
19. Anthropic. (2023). *Claude's Constitution*. [Anthropic](https://www.anthropic.com/news/claudes-constitution/)
20. Google. (2023). *Secure AI Framework (SAIF)*. [Google Cloud](https://cloud.google.com/use-cases/secure-ai-framework)
21. Microsoft. (2022). *Responsible AI Standard, v2*. [Microsoft](https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/bade/documents/products-and-services/en-us/ai/RAIS-Reference-Guide-v2.pdf)

## 书籍

22. Goodfellow, I., Bengio, Y., & Courville, A. (2016). *Deep Learning*. MIT Press.

## 补充论文

23. Kurakin, A., Goodfellow, I., & Bengio, S. (2017). *Adversarial examples in the physical world*. ICLR Workshop.

## 行业报告

24. Gartner. (2025). *Hype Cycle for Artificial Intelligence*.
25. McKinsey. (2025). *The State of AI in 2025*.
26. Stanford HAI. (2025). *Artificial Intelligence Index Report 2025*.

## 中文学术参考

### 综述与报告

27. 清华大学 AIR 团队. *LLM 安全综述相关工作*. 清华大学人工智能研究院.
28. 中国信通院. (2024). *大模型安全报告*. [中国信通院](https://www.caict.ac.cn/)
29. 国家人工智能标准化总体组. *AI 安全标准相关文件*. [全国标准信息公共服务平台](https://www.sac.gov.cn/)

**中文学术资源**： 读者应当查阅 CNKI（中国知网）、万方数据等中文学术数据库，获取最新的 LLM 安全相关研究论文和行业报告。

## 法规与政策

30. 中华人民共和国国家互联网信息办公室. (2023). *生成式人工智能服务管理暂行办法*.
31. Federal Register. (2023). *Executive Order 14110: Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence*（历史文件，后于 2025 年由 Executive Order 14179 撤销）. [Federal Register](https://www.federalregister.gov/documents/2023/11/01/2023-24283/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence)
32. UK AI Safety Institute. (2024). *International Scientific Report on the Safety of Advanced AI*.

## 在线资源

33. Hugging Face. *Security Documentation*. [docs/hub](https://huggingface.co/docs/hub/security)
34. LangChain. *Security Policy*. [LangChain Docs](https://docs.langchain.com/oss/python/security-policy)
35. OpenAI. *Safety & Alignment*. [OpenAI Safety](https://openai.com/safety)

## 事件通报与监管更新

36. OpenAI. (2023). *March 20 ChatGPT outage: Here's what happened*. [OpenAI Research](https://openai.com/research/march-20-chatgpt-outage)
37. The White House. (2025). *Executive Order 14179: Removing Barriers to American Leadership in Artificial Intelligence*. [White House](https://www.whitehouse.gov/presidential-actions/2025/01/removing-barriers-to-american-leadership-in-artificial-intelligence/)
38. Federal Register. (2025). *Executive Order 14179: Removing Barriers to American Leadership in Artificial Intelligence*. [Federal Register](https://www.federalregister.gov/documents/2025/01/31/2025-02172/removing-barriers-to-american-leadership-in-artificial-intelligence)

## 2025-2026 更新补充

39. OWASP. (2025). *Top 10 for LLM Applications 2025*. [OWASP](https://genai.owasp.org/llm-top-10/)
40. European Commission. (2026). *AI Act implementation timeline*. [European Commission](https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai)
41. NIST. (2024). *Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile (NIST AI 600-1)*. [NIST](https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence)
42. Model Context Protocol. (2025). *Security Best Practices*. [Model Context Protocol](https://modelcontextprotocol.io/docs/tutorials/security/security_best_practices)
43. OpenAI. (2025). *New tools and features in the Responses API*（含 MCP 工具支持）. [OpenAI 博客](https://openai.com/index/new-tools-and-features-in-the-responses-api/)
44. Willison, S. (2023). *Delimiters won't save you from prompt injection*. [Simon Willison](https://simonwillison.net/2023/May/11/delimiters-wont-save-you/)
45. GitHub. (2025). *protectai/rebuff repository page*（标记为 archived）. [protectai/rebuff](https://github.com/protectai/rebuff)
46. Anthropic. (2025). *Agentic Misalignment: How LLMs could be insider threats*. [Anthropic Research](https://www.anthropic.com/research/agentic-misalignment)
47. Aim Security. (2025). *Breaking down ‘EchoLeak’, the First Zero-Click AI Vulnerability Enabling Data Exfiltration from Microsoft 365 Copilot*. [Aim Security](https://www.aim.security/post/echoleak-blogpost)
48. GitHub. (2025). *Arbitrary code execution from Cursor Agent through a prompt injection via MCP Special Files*. [GitHub Security Advisories](https://github.com/cursor/cursor/security/advisories/GHSA-4cxx-hrm3-49rm)
49. Anthropic. (2026). *Trustworthy agents in practice*. [Anthropic Research](https://www.anthropic.com/research/trustworthy-agents)
50. Vaswani, A., et al. (2017). *Attention Is All You Need*. NeurIPS 2017. [arXiv](https://arxiv.org/abs/1706.03762)
51. 0DIN / Marco Figueroa. (2025). *ChatGPT guessing game leads to users extracting free Windows OS keys & more*. [0DIN](https://0din.ai/blog/chatgpt-guessing-game-leads-to-users-extracting-free-windows-os-keys-more)
52. Malwarebytes. (2026). *"Reprompt" attack lets attackers steal data from Microsoft Copilot*. [Malwarebytes Labs](https://www.malwarebytes.com/blog/news/2026/01/reprompt-attack-lets-attackers-steal-data-from-microsoft-copilot)
53. PCWorld. (2026). *Her AI agent nuked 200 emails. This guardrail stops the next disaster*. [PCWorld](https://www.pcworld.com/article/3070207/an-ai-agent-nuked-200-emails-this-guardrail-stops-the-next-disaster.html)
54. Cline. (2026). *Post-mortem: Unauthorized Cline CLI npm publish on February 17, 2026*. [Cline Blog](https://cline.bot/blog/post-mortem-unauthorized-cline-cli-npm)
55. Anthropic. (2024). *Many-shot jailbreaking*. [Anthropic Research](https://www.anthropic.com/research/many-shot-jailbreaking)
56. Microsoft. (2024). *Mitigating Skeleton Key, a new type of generative AI jailbreak technique*. [Microsoft Security Blog](https://www.microsoft.com/en-us/security/blog/2024/06/26/mitigating-skeleton-key-a-new-type-of-generative-ai-jailbreak-technique/)
57. Anthropic. (2026). *Next-generation Constitutional Classifiers: More efficient protection against universal jailbreaks*. [Anthropic Research](https://www.anthropic.com/research/next-generation-constitutional-classifiers)
58. NVIDIA. *garak: the LLM vulnerability scanner*. [GitHub](https://github.com/NVIDIA/garak)
59. Microsoft. *PyRIT Documentation*. [PyRIT](https://microsoft.github.io/PyRIT/)
60. Center for AI Safety. *HarmBench*. [GitHub](https://github.com/centerforaisafety/HarmBench)
61. GitHub. (2025). *Bypass re-approval for modified MCP configuration in Cursor*. [GitHub Security Advisories](https://github.com/cursor/cursor/security/advisories/GHSA-24mc-g4xr-4395)
62. McGraw, G., Figueroa, H., McMahon, K., & Bonett, R. (2026). *No Security Meter for AI*. Berryville Institute of Machine Learning (BIML). [BIML](https://berryvilleiml.com/docs/no-security-meter-ai.pdf)
63. McGraw, G., Figueroa, H., Bonett, R., & McMahon, K. (2024). *An Architectural Risk Analysis of Large Language Models: Applied Machine Learning Security*. Berryville Institute of Machine Learning (BIML). [BIML](https://berryvilleiml.com/results/BIML-LLM24.pdf)

## 智能体安全评测基准

64. Debenedetti, E., et al. (2024). *AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents*. [arXiv:2406.13352](https://arxiv.org/abs/2406.13352)
65. Zhan, Q., Liang, Z., Ying, Z., & Kang, D. (2024). *InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents*. [arXiv:2403.02691](https://arxiv.org/abs/2403.02691)
66. Andriushchenko, M., et al. (2024). *AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents*. ICLR 2025. [arXiv:2410.09024](https://arxiv.org/abs/2410.09024)

## RAG 与嵌入安全

67. Zou, W., Geng, R., Wang, B., & Jia, J. (2024). *PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models*. USENIX Security 2025. [arXiv:2402.07867](https://arxiv.org/abs/2402.07867)
68. Morris, J. X., Kuleshov, V., Shmatikov, V., & Rush, A. M. (2023). *Text Embeddings Reveal (Almost) As Much As Text*. EMNLP 2023. [arXiv:2310.06816](https://arxiv.org/abs/2310.06816)

## MCP 与工具生态安全

69. Invariant Labs. (2025). *MCP Security Notification: Tool Poisoning Attacks*. [Invariant Labs](https://invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks)

## 模型工件与反序列化安全

70. ReversingLabs. (2025). *Malicious ML models discovered on Hugging Face platform (nullifAI)*. [ReversingLabs](https://www.reversinglabs.com/blog/rl-identifies-malware-ml-model-hosted-on-hugging-face)

## 智能体与前沿威胁

71. Cohen, S., Bitton, R., & Nassi, B. (2024). *Here Comes The AI Worm: Unleashing Zero-click Worms that Target GenAI-Powered Applications*（Morris II / ComPromptMized）. [arXiv:2403.02817](https://arxiv.org/abs/2403.02817)
72. van der Weij, T., Hofstätter, F., Jaffe, O., Brown, S. F., & Ward, F. R. (2024). *AI Sandbagging: Language Models can Strategically Underperform on Evaluations*. [arXiv:2406.07358](https://arxiv.org/abs/2406.07358)
73. *EVA: Red-Teaming GUI Agents via Evolving Indirect Prompt Injection*. (2025). [arXiv:2505.14289](https://arxiv.org/abs/2505.14289)

## 前沿安全治理框架

74. Anthropic. (2024). *Anthropic's Responsible Scaling Policy*. [Anthropic](https://www.anthropic.com/news/anthropics-responsible-scaling-policy)
75. OpenAI. (2025). *Updating our Preparedness Framework (Version 2)*. [OpenAI](https://openai.com/index/updating-our-preparedness-framework/)
76. Google DeepMind. (2025). *Frontier Safety Framework (Version 2.0)*. [Google DeepMind](https://deepmind.google/blog/updating-the-frontier-safety-framework/)

## 智能体威胁分类

77. OWASP. (2025). *Agentic AI – Threats and Mitigations*. [OWASP](https://genai.owasp.org/resource/agentic-ai-threats-and-mitigations/)

## RLHF 与偏好数据安全

78. *BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT*. (2023). [arXiv:2304.12298](https://arxiv.org/abs/2304.12298)
79. Rando, J., & Tramèr, F. (2023). *Universal Jailbreak Backdoors from Poisoned Human Feedback*. [arXiv:2311.14455](https://arxiv.org/abs/2311.14455)

***

*参考文献会随时间变化，后续版本将持续更新。*


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://yeasy.gitbook.io/ai_security_guide/fu-lu/12_appendix/c_references.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.