Blog

The SydeLabs approach to Automated Red Teaming : Revealing SydeBox

Friday, March 1, 2024

·

5 min read

Ankita Kumari

Co-Founder, SydeLabs

SydeLabs protects your AI system
SydeLabs protects your AI system
SydeLabs protects your AI system

SydeLabs is launching SydeBox- an Automated Red Teaming solution which runs comprehensive scans on any GenAI powered system to detect vulnerabilities. The solution has been effective in finding vulnerabilities in popular open source foundational and fine tunes models such as OpenAI GPTs, Gemini, Llama, Mistral, Cohere and more.

AI Red Teaming

In our previous blog post (Red Teaming Generative AI systems) we discussed the concept of Red Teaming in the GenAI era. As discussed, enterprises building applications powered by GenAI, would require Red Teaming efforts which account for the expanded scope of security threats, the probabilistic nature of GenAI powered outputs and the continuous validation needed with fine tuning of models.

All of this implies that the Red Teaming effort in the GenAI age is relatively more complex and resource intensive as compared to traditional Red teaming efforts. However, to uphold the integrity and safeguard the security of generative artificial intelligence (AI) within an ever-evolving digital landscape, it necessitates ongoing assessment and a more expansive view on cybersecurity concerns which points to Automated Red Teaming.

AI Red Teaming with SydeLabs

SydeLabs serves as a comprehensive hub catering to all facets of AI Security and Risk Management. Integral to our portfolio of offerings is Automated Red Teaming and hence with great zeal, we introduce SydeBox—an Automated Red Teaming solution which is model agnostic and scans for multiple categories of vulnerabilities.

Introducing SydeBox

SydeBox provides a user-friendly solution allowing users to input endpoints of their desired models or applications for vulnerability scanning. This tool conducts Automated Red Teaming on the specified model and delivers a detailed report outlining the vulnerabilities detected. A key feature of this report is the 'SydeScore,' a comprehensive score assessing the target's resilience across user-selected attack categories. Rated on a scale of 100, the SydeScore offers enterprises valuable insights into the robustness of their models against potential attacks and misuse. The report gives you a breakup of identified vulnerabilities by severity and category.

SydeBox Vulnerability Report of a scanned target

Red Teaming by SydeBox

The Beta provides 6 key categories for running the scan currently which will be expanded to 10 categories for the final product launch. Sydebox beta includes:

  1. Prompt Injection

  2. Jailbreaks

  3. Evasion

  4. Safety

  5. System Prompt Leakage

  6. Known Popular Attacks

Within each category there are multiple tests and the tool makes multiple attempts to break a system. The Automated Red Teaming while spanning multiple categories and tests, also accounts for multiple techniques of attack such as DAN, dev mode and token smuggling.

Each category is complex enough to warrant a separate blog. Stay tuned to learn more about the different techniques and the categories that can be used to attack your targets.

Using SydeBox Effectively

SydeBox opens up for a private Beta on 1st of March, 2024. You can request access to the beta product here.

Overall, SydeBox runs a very exhaustive and comprehensive scan of the target and then generates a report with clearly outlined vulnerabilities.

The first step to using SydeBox effectively, once you receive access, is to run a scan on a pre-configured target. Currently we have added OpenAI's GPT3.5 Turbo as a preconfigured target for you to get an understanding of how a scan report looks like. 

New Scan user experience on SydeBox. 

Meanwhile, you can refer to the product documents and add your own targets. You can add multiple targets and run scans on all of them at the same time. Our recommendation is to run all categories for the initial few scans on your targets to gain a comprehensive understanding of your system. SydeBox also allows you to choose which categories of tests you want to run on your target.

Once a scan is completed, you can view the vulnerability report of the respective target as identified by the most recent scan on the Dashboard. You could also refer to the Scan History page to look at reports of previous scans run on the same target and analyse improvements you have made.

Using SydeBox on your experimental or proof-of-concept targets would increase your confidence to deploy your generative AI application to production at scale.

SydeLabs commitment to securing AI

SydeLabs team relentlessly enhances vulnerability detectors to scan your systems. SydeBox is designed to continuously monitor your systems and it's test suite is always updated with zero-day exploits. An organisation can now be confident in their generative AI initiatives because of the insights they receive by using SydeBox.

In the coming weeks, we will be publishing vulnerability reports generated by using SydeBox on various open source models.

Ankita Kumari

Co-Founder, SydeLabs

San Francisco, California

Protect your generative AI applications from the ever-expanding threat landscape of LLM systems.

San Francisco, California

Protect your generative AI applications from the ever-expanding threat landscape of LLM systems.

San Francisco, California

Protect your generative AI applications from the ever-expanding threat landscape of LLM systems.