Blog

Applying legal privilege protection for red teaming, assessing generative AI systems

Brenda Leong

Andrew Eichen

Ekene Chuks-Okeke

August 14, 2024

Attorneys increasingly play a crucial role in developing responsible artificial intelligence governance and managing identified risks around generative AI models, providing guidance earlier in product life cycles and engaging in the hands-on red teaming of generative AI systems.

Red teaming is not just a buzzword. It is a critical component of AI governance in emerging regulatory frameworks.

President Joe Biden's Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence calls for the U.S. National Institute of Standards and Technology to establish rigorous standards for AI red teaming and requires companies developing dual-use foundation models to report their red-team testing results to the federal government. Other laws, such as the Colorado AI Act and EU Artificial Intelligence Act, also incorporate these types of assessments.

Red-team testing is among the most important methods used to uncover specific vulnerabilities in large language models and other generative AI models. Protecting red-team testing results and related communications is a key concern for companies developing or incorporating these models, as they seek to limit the exposure of their testing procedures and any vulnerabilities discovered.

When lawyers are part of the testing process, one way to protect this information is through attorney-client privilege. So, understanding the nuances of privilege in this context is crucial for companies to properly protect information during the evaluation process.

What is red teaming?

Initially a term used in Cold War conflict simulations, red teaming was later adapted to cybersecurity testing that involves proactively attacking computer networks to measure resilience and identify vulnerabilities. Now in AI contexts, red teaming refers to systematically stress testing generative AI systems to discover problematic outputs and performance issues.

For generative AI red teaming, testers create structured input sequences aligned to specified topic areas to strain the AI system against its controlling rules and filters. Techniques may include overwhelming the system with content, injecting malicious code or devising creative prompts to induce inappropriate outputs.

Red teams often focus on testing for outputs that could create legal liabilities, such as misuse for illegal activities, copyright infringement and discriminatory texts or imagery. When undesirable outcomes are identified during red teaming exercises, developers can respond by including further safeguards to mitigate operations in real-world scenarios.

Responsible AI governance cannot occur without this type of assessment.

However, red teaming leaves a considerable paper trail when done correctly. Communication between testers, the documentation of testing steps, mitigation strategies employed and any conclusions or reports provide crucial information for developers to address vulnerabilities and improve their models.

Therefore, companies have a countervailing disincentive to create this documentation if it is seen as a potential liability in future litigation. Opposing parties might seek evidence that the AI developer or provider was aware of a system flaw, particularly if they believe a company conducted the testing but is unable to demonstrate appropriate action based on the findings.

Attorney-client privilege 101

Attorney-client privilege is a legal doctrine that shields confidential communication between attorneys and their clients when those communications were made for the purpose of obtaining or providing legal advice. The privilege extends beyond verbal discussions to encompass written correspondence, reports and other information exchanged during these covered interactions. Specific rules vary by state, but the fundamental principles of privilege are generally consistent. We focus here on the federal standard.

Privilege must first be claimed by the party to the suit for the specific communications at issue, and then the court will determine whether privilege applies. It is a highly contextual, fact-specific inquiry.

In the context of red teaming generative AI, the court will likely ask two primary questions: Whether the information was obtained and communicated for a legal or business purpose and whether any privilege was already waived because the communication was disclosed to third-parties or excessively disseminated within the organization.

To determine if privilege applies to a red teaming report and related documentation, the court will assess whether the testing was necessary to obtain legal advice regarding the system's potential liabilities, along with how the testing process and results were internally managed. No court has directly ruled on privilege in this context. However, based on legal precedent and existing legal principles, our best assessment is that privilege would likely apply if red-team testing is performed at the direction of counsel for the purpose of providing legal advice about compliance with applicable laws, and the results are maintained appropriately. But there are no guarantees.

Question 1: Was the information gathered for a legal or business purpose?

Privilege protects legal communications only. Nonlegal communications, such as business advice, are not protected. Most courts use a primary purpose test to make this determination, although there are variations. Under this test, the court balances the legal and nonlegal objectives that drove a particular communication. If the dominant purpose was to obtain legal advice, the communication is privileged. If the primary purpose was business-related, it is not privileged.

The clearest path to establishing privilege exists when the primary objective of red teaming is to ensure compliance with specific laws or to uncover model outputs that could give rise to legal liability. For example, when considering copyright infringements, lawyers can apply their understanding of intellectual property law to assess the substantial similarity to copyrighted text or images of a generative AI system's output.

Similarly, a generative AI system used for employment decisions can be red teamed for biased output along protected class categories, so lawyers can advise on concerns under antidiscrimination laws.

Conversely, privilege may be more difficult to assert when testing is performed for general system performance with no defined regulatory context. Routinely involving counsel in red teaming activities with no concrete legal dimension may reduce the chances of related communications being privileged.

Legal red teaming should, therefore, be clearly distinguished from ordinary testing for business purposes, as courts have held that routine work in the ordinary course of business is not privileged and cannot be shielded from discovery through the mere involvement of an attorney.

The Target Customer Data Security Breach Litigation provides a helpful roadmap to make this distinction. In this case, Target conducted two parallel investigations — one for business purposes and one at the direction of outside counsel for legal guidance. Because the two investigations were separated, the latter investigation was deemed privileged.

A similar two-track approach could be applied to generative AI red teaming. One track could concentrate on assessing model performance or other nonlegal concerns, while the other is directed by legal counsel to identify and mitigate potential legal risks.

Ideally, to support an assertion of privilege, legal and business testing should be performed by different teams with the legal team's red teaming efforts specifically focused on identifying potential regulatory risks or liability. Documentation and communications should consistently reflect this legal focus, and reports should link potential legal implications to testing outcomes.

Similarly, when retaining external red teams, the engagement letter should specify that the purpose is to provide legal advice about potential legal liability associated with the model.

Companies might conduct the red teaming internally or employ a third party for red teaming, including outside counsel. Courts are more likely to recognize privilege in communications issued by external counsel, rather than those generated internally.

Nonattorney outside experts can also conduct AI red teaming under privilege, if structured properly. Courts have held that a third party's work may be privileged if it is necessary to translate technical concepts and enable an attorney to provide legal advice.

To increase the likelihood of maintaining privilege when nonattorney red teams are engaged, the third party should be retained by outside counsel and directed by an attorney, the engagement letter should specify that the work is to assist in providing legal advice, counsel should document their reliance on the third party's findings, and a vendor other than the one that conducted business red teaming should be hired.

Question 2: Was privilege already waived?

The second part of the court's assessment will consider whether the protection has been waived through disclosure to third parties that do not share a common legal interest, including intentional disclosure to regulators, or through extensive internal dissemination.

The complexity here is that intentional disclosure to third parties may trigger waiver not only for the specific materials disclosed, but for other related communications. As a result, disclosing even partial testing materials from red teaming or a summary of a longer report risks waiving privilege fpr the underlying findings and other documents.

Under these circumstances, the court will consider whether fairness requires the undisclosed portions to be considered together with the disclosed materials, based on:

The strategic purpose behind the disclosure. Waiver is more likely if the disclosure seems aimed at gaining a tactical advantage, since courts have repeatedly said companies may not invoke privilege as both a sword and a shield.
Prejudice to the opposing party. If the disclosure reveals only favorable information while holding back the rest, a court is more likely to find waiver.
The level of detail disclosed. There is a greater risk of waiver if the disclosed material reveals the substance of the communication, such as substantive excerpts, details of the counsel's analysis or specific facts the analysis was based on.

The safest course of action is to not disclose any material based on the red teaming endeavor to regulators or the public, or to make the material as minimal and high-level as possible.

Sharing the red teaming documentation internally can also vitiate privilege if the recipients lack a "need to know." Courts have held that privileged communications may only be shared with employees whose duties reasonably require them to have access.

Therefore, companies should restrict access to red teaming documentation and findings to employees in roles directly related to addressing the legal risks identified in testing. The group should be kept as small as possible. To reflect this, companies should identify those with access and impose strict restrictions on further dissemination.

Limitations of privilege

It is important to understand that, even if red teaming documentation itself is protected by attorney-client privilege, this does not protect the underlying facts from discovery.

The facts learned during red teaming remain discoverable through other means, such as depositions and interrogatories. Opposing counsel can question employees about system capabilities identified during testing, even if the employees only learned those facts from a privileged report.

Recommended best practices for preserving privilege in AI red teaming

Engage outside counsel for legal red teaming. Having outside counsel actually perform the testing provides the clearest path to establishing privilege. Otherwise, outside counsel should closely manage the third-party engagement.

Clearly separate legal and business red teaming. The legal workstream should focus exclusively on evaluating potential regulatory compliance issues and legal risks. Business testing objectives should be managed separately.

Document the legal purpose. Ensure all documentation reinforces its legal purpose, including engagement letters and reports, which should directly link testing findings to specific legal implications.

Exercise caution with disclosures: If information is disclosed externally, companies should keep the information as high-level as possible, omitting specific findings, results or conclusions. Disclosures should never be made strategically to gain an advantage.

Limit internal sharing. Internally, share privileged communications only with employees who have a legitimate need to know the information, generally those involved in addressing legal risks. Track distribution, and instruct recipients against further dissemination.

Proper management is essential

Red teaming is essential for responsible governance of generative AI systems, but it can create legal risks when it demonstrates a company's knowledge of a system's vulnerabilities. Properly managing legal privilege during red teaming is therefore essential for companies to protect sensitive information while maintaining regulatory compliance and prioritizing AI safety.

‍

Originally published on the IAPP website on August 14, 2014

‍