Blog

Red Teaming Generative AI systems

Thursday, February 29, 2024

3 min read

Ruchir Patwa

Co-Founder, SydeLabs

What is Red Teaming

Red Teaming is a seasoned security practice which involves assembling a group of security experts who act as ethical hackers. Their objective? To expose the system’s security weaknesses through meticulously crafted payloads, scenarios, and adversarial inputs. With the advent of generative AI there is an added form of meticulously crafted prompts as well.

As outlined in our previous blog [AI Risk - Unveiling the Paradox], to identify and measure the risk of a generative AI system it is imperative to have a way to measure the “Probability of a negative outcome”. Running a red team exercise via automated, augmented or manual processes on your systems is the best approach to get a better understanding of your current risk exposure.

Red Teaming in the GenAI era

Red Teaming in the realm of GenAI operates on two principal layers: the core model layer, exemplified by technologies like GPT, Gemini, Llama etc., and the more user-facing application layer, as seen in tools such as ChatGPT, Github Copilot, Notion AI etc that deploy a model under the hood. Red teaming each layer presents its own set of benefits.

Red Teaming at the core model stage allows for the early detection of potential misuse avenues, helps outline the model's operational boundaries, and uncovers its inherent constraints. These early discoveries are crucial as they inform the ongoing refinement of the model and help pinpoint optimal application domains.
At the application layer, red teaming adopts a comprehensive system-wide perspective, considering the foundational model as just one element of a larger ecosystem and more importantly can focus on the use-case and business logic of the application.

AI Red Teaming vs Traditional Red Teaming

AI Red teaming turns out to be more complex and non-deterministic. Here's explaining what makes red teaming a genAI application is a more challenging task as compared to red teaming traditional software systems.

Expanded risk exposure of genAI systems

AI Red Teaming needs to cover traditional security concerns like data leakage, unauthorised access etc but also expand to other areas including but not limited to identifying bias, ensuring safety, conforming to content regulations etc.

genAI's probabilistic nature

In a traditional system if you have tried an attack payload once and failed, in most cases you can be sure that the same system will behave the same way for the same payload. This is not true in probabilistic systems like Generative AI systems. The same malicious prompt on first attempt might give a benign outcome but on its second attempt might be successful.

Need for Continuous Validation

For a traditional system when you make a particular change, you can be certain of the immediate impact it will have on the system which is not true for most AI systems. Fine tuning a Generative AI model for a particular use-case can have a completely unrelated impact on its alignment or safety controls. This makes it imperative to continuously red team an AI system even when there are minimal changes being made to it.

Measuring success of genAI attacks is more complex

Measuring the success of an attempted payload on traditional systems is very straightforward. In most cases the status code from the service or the data returned will tell you if the payload was successful or not. For generative AI, success of a payload can be very subjective, for example for a system designed to generate images, identifying if the output has bias can be very subjective.

Conclusion

In summary, Red Teaming in the era of Generative AI is a crucial and complex process that goes beyond traditional cybersecurity measures. It not only identifies vulnerabilities in both the core models and their applications but also adapts to the unique, probabilistic nature of AI systems. This approach requires continuous testing and a broader scope of security considerations, reflecting the nuanced challenges of ensuring safety and ethical compliance in AI technologies. As such, Red Teaming is indispensable for maintaining the integrity and security of Generative AI systems in our increasingly digital world.

Ruchir Patwa

Co-Founder, SydeLabs