A practical guide to AI safety and security

Monday, March 25, 2024


5 min read

Ankita Kumari

Co-founder, SydeLabs

New threat landscape for GenAI systems
New threat landscape for GenAI systems
New threat landscape for GenAI systems

When it comes to thinking about AI safety and security, it is crucial for individuals and organisations to start considering these aspects early on. The landscape of AI is rapidly evolving, and with this evolution comes the need to address security and safety concerns related to AI systems.

Historically, Security and Risk Management have been seen as cost centres in organisations. With the adoption of generative AI, we see this trend changing. CISOs and CXOs globally are recognising the threat of financial loss and reputation damage emerging as cost of inaction on these new risk vectors.

What do you mean by AI safety and security

AI safety and security refers to the practices to ensure your AI systems are protected from attack vectors such as prompt injections, jailbreaks, toxic language, harmful content, data leakage and any form of abuse.

AI security vs Traditional cybersecurity

Traditional cybersecurity deals with deterministic outputs and hence works on a pattern based approach. AI systems on the other hand being probabilistic in nature require intent based protection where traditional cybersecurity measures fall short.

Also, with any change in traditional systems the outcomes are predictable whereas with AI systems even a small change can bring unpredictable outcomes. Observability and traceability is also low in AI systems as compared to traditional systems. Because of all these key differences between AI systems and traditional systems, traditional cybersecurity cannot take care of AI security.

When to start thinking about AI safety and security

The single most important premise when it comes to safety and security of any system is to preempt the vulnerabilities and stay ahead of the bad actors. So is the case with AI safety and security. The bad actors globally have already devised multiple means, techniques and attack profiles. The best bet for enterprises is to start thinking about AI safety and security from the day you start building an AI model or an application powered by one.

But why do I protect against a threat whose odds are minimal

While the odds of exploitation of a safety and security vulnerability in your AI system can be minimal, the impact of it will mostly be high and in the form of both, financial loss and reputation damage.

Where do I start

As you start building AI systems, think about their safety and security alignment alongside

  • Iteratively identify blindspots in your model early on in the development cycle through a comprehensive red teaming exercise and fix the vulnerabilities by fine tuning

  • Have a run time protective layer for your AI systems which prevents attacks and abuse caused due to any vulnerability which might have been missed while red teaming

How will red teaming be helpful

Red Teaming and specially Automated Red Teaming can help iteratively vulnerabilities in your model early on. After fixing and improving on the vulnerabilities identified, a production deployment of the AI model/ application is less prone to any security attack or abuse. This helps safeguard against financial loss and any damage to the brand reputation in case of the model giving out toxic or harmful content.

How does run time security of AI systems work

AI systems need to stay up and responsive all the time. Hence, run time security for these systems in indispensable. In run time the protection layer of your AI system should evaluate every incoming prompt/ query on it’s intent and give out a threat score for the prompt. Basis the threat score your application can decide whether to block the user, give out a standard harmless response, monitor the user etc. Similarly the protection layer should also gauge the response coming from your LLM for any harmful content, data leakage etc and give it a similar score. So in a nutshell, in run time, it’s about identifying the threat potential of the input and output and taking an appropriate action on it in real time.

What happens when I fine tune my model for better accuracy before deployment

Fine tuning a model for any functional improvement or otherwise can completely change the safety and security alignment of the model. Hence, red teaming of models needs to happen iteratively after each fine tuning.

Does it matter which foundational model I start with to ensure better safety and security

In our research and testing of open source foundational models, we have seen them vary widely in terms of safety and security alignment. And hence you would see each of them getting a different SydeScore which is a composite score of our Automated AI red teaming of these models. We will soon publish the comparison study of most open source models around safety and security. So if you are directly using any of these models for production use case without any customisation, it is better to use a model which is more robust.

However, if you build your custom models using these foundational models and further fine tune it, the safety and security alignment of the model is anyways disturbed and you would have to iteratively red team, identify and fix those vulnerabilities.

What are some recent examples of safety and security vulnerabilities of LLMs causing concerns for enterprises

There have been many recently, to speak of a few:

  1. Google was summoned when Gemini called Mr. Narendra Modi (the current Prime Minister of India) a fascist

  2. Customer chatbot of Air Canada was compromised which led to the company having to refund the fare for a customer of a trip which was already made


AI safety and security is a continual improvement process both from an enterprise point of view and from the point of view of solutions being built for the same. Yet, starting early on this journey can help enterprises deploy their AI systems confidently and actually reap the benefits of this powerful technology. Any powerful technology comes alongside its perils. The onus of responsible AI adoption is upon us. Ensuring safety and security of AI systems goes a long way in fulfilling this responsibility and also serving it for the advantage of the enterprise.

Ankita Kumari

Co-founder, SydeLabs

San Francisco, California

Protect your generative AI applications from the ever-expanding threat landscape of LLM systems.

San Francisco, California

Protect your generative AI applications from the ever-expanding threat landscape of LLM systems.

San Francisco, California

Protect your generative AI applications from the ever-expanding threat landscape of LLM systems.