附录 C:参考文献
附录 C:参考文献
本附录列出本书编写过程中参考的主要资料。
标准与框架
OWASP. OWASP Top 10 for LLM Applications(项目主页). OWASP ``
NIST. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). NIST ``
European Union. (2024). Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI Act). EUR-Lex ``
MITRE. (2024). ATLAS (Adversarial Threat Landscape for AI Systems). MITRE ATLAS ``
研究论文
安全对齐
Ouyang, L., et al. (2022). Training language models to follow instructions with human feedback. NeurIPS 2022. ``
Bai, Y., et al. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv preprint. ``
Rafailov, R., et al. (2023). Direct Preference Optimization: Your Language Model is Secretly a Reward Model. NeurIPS 2023. ``
攻击技术
Perez, F., & Ribeiro, I. (2023). Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition. EMNLP 2023. arXiv(检索): arxiv.org ; ACL Anthology(检索): aclanthology.org ``
Greshake, K., et al. (2023). Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. AISec 2023. ``
Zou, A., et al. (2023). Universal and Transferable Adversarial Attacks on Aligned Language Models. arXiv preprint. ``
Wei, A., et al. (2023). Jailbroken: How Does LLM Safety Training Fail?. NeurIPS 2023. ``
隐私与数据安全
Carlini, N., et al. (2021). Extracting Training Data from Large Language Models. USENIX Security 2021. ``
Carlini, N., et al. (2023). Quantifying Memorization Across Neural Language Models. ICLR 2023. ``
Nasr, M., et al. (2023). Scalable Extraction of Training Data from (Production) Language Models. arXiv preprint. ``
模型安全
Goldblum, M., et al. (2022). Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses. IEEE TPAMI. ``
Tramèr, F., et al. (2016). Stealing Machine Learning Models via Prediction APIs. USENIX Security 2016. ``
技术报告与白皮书
OpenAI. (2023). GPT-4 System Card. OpenAI Technical Report. ``
Anthropic. (2024). Claude's Constitution. Anthropic Research. ``
Google. (2023). Secure AI Framework (SAIF). Google Security Blog. ``
Microsoft. (2024). Responsible AI Standard, v2. Microsoft. ``
书籍
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. ``
补充论文
Kurakin, A., Goodfellow, I., & Bengio, S. (2017). Adversarial examples in the physical world. ICLR Workshop. ``
行业报告
Gartner. (2025). Hype Cycle for Artificial Intelligence. ``
McKinsey. (2025). The State of AI in 2025. ``
Stanford HAI. (2025). Artificial Intelligence Index Report 2025. ``
中文学术参考
综述与报告
清华大学 AIR 团队. LLM 安全综述相关工作. 清华大学人工智能研究院. ``
中国信通院. (2024). 大模型安全报告. 中国信通院 ``
国家人工智能标准化总体组. AI 安全标准相关文件. 国标委 ``
中文学术资源: 读者应当查阅 CNKI(中国知网)、万方数据等中文学术数据库,获取最新的 LLM 安全相关研究论文和行业报告。
法规与政策
中华人民共和国国家互联网信息办公室. (2023). 生成式人工智能服务管理暂行办法. ``
White House. (2023). Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. ``
UK AI Safety Institute. (2024). International Scientific Report on the Safety of Advanced AI. ``
在线资源
Hugging Face. Security Documentation. docs/hub ``
LangChain. Security Best Practices. python.langchain.com ``
OpenAI. Safety & Alignment. OpenAI Safety ``
事件通报与监管更新
OpenAI. (2023). March 20 ChatGPT outage: Here's what happened. OpenAI Research ``
The White House. (2025). Executive Order 14179: Removing Barriers to American Leadership in Artificial Intelligence. White House ``
Federal Register. (2025). Executive Order 14179 filing and revocations. Federal Register ``
2025-2026 更新补充
OWASP. (2025). Top 10 for LLM Applications 2025. OWASP ``
European Commission. (2025). Timeline - AI Act implementation timeline. European Commission ``
NIST. (2024). Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile (NIST AI 600-1). NIST ``
Model Context Protocol. (2025). Security Best Practices. Model Context Protocol ``
OpenAI. (2025). New tools and features in the Responses API(含 MCP 工具支持). OpenAI 博客 ``
Willison, S. (2023). Delimiters won't save you from prompt injection. Simon Willison ``
GitHub. (2025). protectai/rebuff repository page(标记为 archived). protectai/rebuff ``
NSFOCUS. (2025). LLM Data Leakage Incidents: ChatGPT Keyword Filter Bypass via Word Games. NSFOCUS Global ``
Aim Security. (2025). EchoLeak: Zero-Click Agentic Vulnerability in Microsoft 365 Copilot. Aim Security ``
NSFOCUS. (2025). Cursor IDE Prompt Injection Vulnerabilities (CVE-2025-54135, CVE-2025-54136). NSFOCUS Global ``
Security Boulevard. (2026). Microsoft Copilot Reprompt Attack: Session Hijacking via URL Prompt Parameter Manipulation. Security Boulevard ``
PCMag. (2026). Meta AI Agent Deletes 200+ Emails After Context Window Compaction Loses Safety Instructions. PCMag ``
Vaswani, A., et al. (2017). Attention Is All You Need. NeurIPS 2017. arXiv ``
参考文献截至 2026-02-25(并会随时间变化),后续版本将持续更新。
最后更新于
