附录 C:参考文献

附录 C:参考文献

本附录列出本书编写过程中参考的主要资料。

标准与框架

  1. OWASP. OWASP Top 10 for LLM Applications(项目主页). OWASParrow-up-right ``

  2. NIST. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). NISTarrow-up-right ``

  3. European Union. (2024). Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI Act). EUR-Lexarrow-up-right ``

  4. MITRE. (2024). ATLAS (Adversarial Threat Landscape for AI Systems). MITRE ATLASarrow-up-right ``

研究论文

安全对齐

  1. Ouyang, L., et al. (2022). Training language models to follow instructions with human feedback. NeurIPS 2022. ``

  2. Bai, Y., et al. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv preprint. ``

  3. Rafailov, R., et al. (2023). Direct Preference Optimization: Your Language Model is Secretly a Reward Model. NeurIPS 2023. ``

攻击技术

  1. Perez, F., & Ribeiro, I. (2023). Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition. EMNLP 2023. arXiv(检索): arxiv.orgarrow-up-right ; ACL Anthology(检索): aclanthology.orgarrow-up-right ``

  2. Greshake, K., et al. (2023). Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. AISec 2023. ``

  3. Zou, A., et al. (2023). Universal and Transferable Adversarial Attacks on Aligned Language Models. arXiv preprint. ``

  4. Wei, A., et al. (2023). Jailbroken: How Does LLM Safety Training Fail?. NeurIPS 2023. ``

隐私与数据安全

  1. Carlini, N., et al. (2021). Extracting Training Data from Large Language Models. USENIX Security 2021. ``

  2. Carlini, N., et al. (2023). Quantifying Memorization Across Neural Language Models. ICLR 2023. ``

  3. Nasr, M., et al. (2023). Scalable Extraction of Training Data from (Production) Language Models. arXiv preprint. ``

模型安全

  1. Goldblum, M., et al. (2022). Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses. IEEE TPAMI. ``

  2. Tramèr, F., et al. (2016). Stealing Machine Learning Models via Prediction APIs. USENIX Security 2016. ``

技术报告与白皮书

  1. OpenAI. (2023). GPT-4 System Card. OpenAI Technical Report. ``

  2. Anthropic. (2024). Claude's Constitution. Anthropic Research. ``

  3. Google. (2023). Secure AI Framework (SAIF). Google Security Blog. ``

  4. Microsoft. (2024). Responsible AI Standard, v2. Microsoft. ``

书籍

  1. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. ``

补充论文

  1. Kurakin, A., Goodfellow, I., & Bengio, S. (2017). Adversarial examples in the physical world. ICLR Workshop. ``

行业报告

  1. Gartner. (2025). Hype Cycle for Artificial Intelligence. ``

  2. McKinsey. (2025). The State of AI in 2025. ``

  3. Stanford HAI. (2025). Artificial Intelligence Index Report 2025. ``

中文学术参考

综述与报告

  1. 清华大学 AIR 团队. LLM 安全综述相关工作. 清华大学人工智能研究院. ``

  2. 中国信通院. (2024). 大模型安全报告. 中国信通院arrow-up-right ``

  3. 国家人工智能标准化总体组. AI 安全标准相关文件. 国标委arrow-up-right ``

中文学术资源: 读者应当查阅 CNKI(中国知网)、万方数据等中文学术数据库,获取最新的 LLM 安全相关研究论文和行业报告。

法规与政策

  1. 中华人民共和国国家互联网信息办公室. (2023). 生成式人工智能服务管理暂行办法. ``

  2. White House. (2023). Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. ``

  3. UK AI Safety Institute. (2024). International Scientific Report on the Safety of Advanced AI. ``

在线资源

  1. Hugging Face. Security Documentation. docs/hubarrow-up-right ``

  2. LangChain. Security Best Practices. python.langchain.comarrow-up-right ``

  3. OpenAI. Safety & Alignment. OpenAI Safetyarrow-up-right ``

事件通报与监管更新

  1. OpenAI. (2023). March 20 ChatGPT outage: Here's what happened. OpenAI Researcharrow-up-right ``

  2. The White House. (2025). Executive Order 14179: Removing Barriers to American Leadership in Artificial Intelligence. White Housearrow-up-right ``

  3. Federal Register. (2025). Executive Order 14179 filing and revocations. Federal Registerarrow-up-right ``

2025-2026 更新补充

  1. OWASP. (2025). Top 10 for LLM Applications 2025. OWASParrow-up-right ``

  2. European Commission. (2025). Timeline - AI Act implementation timeline. European Commissionarrow-up-right ``

  3. NIST. (2024). Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile (NIST AI 600-1). NISTarrow-up-right ``

  4. Model Context Protocol. (2025). Security Best Practices. Model Context Protocolarrow-up-right ``

  5. OpenAI. (2025). New tools and features in the Responses API(含 MCP 工具支持). OpenAI 博客arrow-up-right ``

  6. Willison, S. (2023). Delimiters won't save you from prompt injection. Simon Willisonarrow-up-right ``

  7. GitHub. (2025). protectai/rebuff repository page(标记为 archived). protectai/rebuffarrow-up-right ``

  8. NSFOCUS. (2025). LLM Data Leakage Incidents: ChatGPT Keyword Filter Bypass via Word Games. NSFOCUS Globalarrow-up-right ``

  9. Aim Security. (2025). EchoLeak: Zero-Click Agentic Vulnerability in Microsoft 365 Copilot. Aim Securityarrow-up-right ``

  10. NSFOCUS. (2025). Cursor IDE Prompt Injection Vulnerabilities (CVE-2025-54135, CVE-2025-54136). NSFOCUS Globalarrow-up-right ``

  11. Security Boulevard. (2026). Microsoft Copilot Reprompt Attack: Session Hijacking via URL Prompt Parameter Manipulation. Security Boulevardarrow-up-right ``

  12. PCMag. (2026). Meta AI Agent Deletes 200+ Emails After Context Window Compaction Loses Safety Instructions. PCMagarrow-up-right ``

  13. Vaswani, A., et al. (2017). Attention Is All You Need. NeurIPS 2017. arXivarrow-up-right ``


参考文献截至 2026-02-25(并会随时间变化),后续版本将持续更新。

最后更新于