Challenges of Red Teaming AI Systems

This article discusses the challenges of red teaming tests for artificial intelligence (AI) systems, summarizes different red teaming methods, and emphasizes the importance of establishing standardized practices and policy recommendations.

I. Introduction
This article introduces red teaming methods for testing AI systems and related experiences, including the advantages and challenges of different methods. It aims to provide a reference for other companies, policymakers, and organizations, emphasizing the importance of establishing standardized red teaming practices.

II. Red Teaming Methods and Challenges
Domain-Specific Expert Red Teaming
Collaborating with domain experts to identify and assess potential risks of AI systems helps to gain a deep understanding of complex issues.

Trust and Security: Policy Vulnerability Testing (PVT)
For high-risk threats, such as those posing serious harm to individuals or society, "Policy Vulnerability Testing" (PVT) is adopted. Collaborating with external experts to conduct in-depth qualitative testing on various policy themes covered by policies.

Challenges: Handling sensitive information may be restricted, affecting risk assessment and mitigation; there are operational costs associated with external collaboration.

National Security: Cutting-Edge Threat Red Teaming
Cutting-edge threat red teaming is conducted for national security risks such as chemical, biological, radiological, and nuclear, cybersecurity, and autonomous AI. Collaborating with experts to test systems and co-design new assessment methods.

Advantages: In-depth study of threats, participation of subject matter experts, and public-private cooperation.

Challenges: Handling sensitive information is restricted, and there are operational costs.

Region-Specific: Multilingual and Multicultural Red Teaming
Collaborating with relevant institutions in Singapore to conduct red teaming in various languages and cultural contexts to test AI systems' language skills and related topics.

Challenges: Language and cultural differences pose difficulties.

AI-Driven Red Teaming
Utilizing AI system capabilities to automatically generate adversarial samples to test the robustness of other AI models, supplementing manual testing.

Challenges: The effectiveness is affected by the model's capabilities and accuracy.

Automated Red Teaming
Adopting a red team/blue team dynamic, using models to generate attacks, and then fine-tuning the models to make them more robust against attacks.

Challenges: Overfitting may occur, and it is difficult to cover all types of attacks.

New Modality Red Teaming
Testing AI systems capable of handling multiple input forms (such as images, audio) to identify new risks and failure patterns.

Challenges: The diversity of input forms increases the complexity of testing.

Multimodal Red Teaming (with Claude 3 as an example)
Claude 3 can handle visual information and output text, testing its image and text-related risks, as well as its ability to reject harmful inputs.

Challenges: Multimodality brings new risks and testing difficulties.

Open, General Red Teaming
Includes crowdsourced red teaming (for general hazards) and community red teaming (for general risks and system limitations). Crowdsourced red teaming allows crowdworkers to test based on their own judgment during the research phase; community red teaming involves more people through relevant activities.

Advantages: Public participation, identification of common hazards, and educational opportunities.

Challenges: Lack of depth, operational costs, and unclear feedback loops.

III. From Qualitative to Quantitative Assessment
Red teaming practices can serve as a prelude to building automated quantitative assessment methods. From experts describing potential threat models and manual testing to using language models to generate a large number of input variations, the transition from qualitative to quantitative automated testing has been applied in national security and election integrity risk-related work with this iterative approach.

IV. Policy Recommendations
Fund relevant organizations to develop red teaming technical standards and general practices.
Fund independent organizations to collaborate with developers on red teaming systems.
Encourage the development of a professional red teaming service market and establish certification processes.
Encourage AI companies to allow third-party red teaming and establish transparent and model access standards.
Encourage AI companies to link red teaming practices with new model development or release policies.

V. Conclusion
Red teaming is an essential technique for identifying and mitigating risks in AI systems. Different methods are suitable for different use cases and threat models. We look forward to collaborating and iterating on technology to establish secure testing standards to build safe and beneficial AI systems.