Blog

AI Firewalls: Securing the future of Generative AI systems

Monday, April 8, 2024

10 min read

Sailesh Mishra

Founding Member, SydeLabs

All around security for LLMs, the brain of AI

How do Firewalls work

A firewall is a security system that monitors and controls incoming and outgoing network traffic based on predetermined rules or policies. There are two primary categories of firewalls:

Network Firewall: This type of firewall filters traffic between an internal network and external networks such as the internet. It can be hardware-based or software-based and operates at either the packet filtering level (filtering individual packets), circuit-level gateway (monitoring TCP connections), or application-gateway level (inspecting specific protocols).
Web Application Firewall (WAF): A WAF specifically protects web applications from attacks by inspecting HTTP(S) requests to detect and block malicious input data. Unlike traditional network firewalls, which focus on IP addresses and ports, WAFs analyze content within each request to identify potential threats.

Web Application Firewalls (WAFs) protect web applications by inspecting incoming HTTP(S) traffic and analyzing its content, headers, and parameters. To understand how WAFs check for patterns, consider the following processes:

Rule-Based Pattern Matching: WAFs utilize predefined rules to identify and block common attack patterns, such as SQL injection, cross-site scripting (XSS), and directory traversal. Rules are continuously updated to account for newly discovered vulnerabilities and emerging threats.
Machine Learning and Behavioral Analytics: Modern WAFs leverage artificial intelligence algorithms and threat intelligence to improve pattern recognition beyond static rule sets. This allows WAFs to adapt to novel attack techniques and better predict future threats.

The inspection process occurs at multiple layers, ensuring thorough examination of both the content and structure of requests and responses. WAFs can be deployed in various modes, such as inline or bridge mode, depending on the desired level of intervention and transparency

Increased Risk Surface Area of Generative AI systems

How do Firewalls Work?

A WAF specifically protects web applications by inspecting incoming HTTP(S) traffic and analyzing its content, headers, and parameters. WAFs utilize predefined rules to identify and block common attack patterns, such as SQL injection, cross-site scripting (XSS), and directory traversal. Rules are continuously updated to account for newly discovered vulnerabilities and emerging threats.

What is different now with genAI?

Every organization has long sought to enhance customer experience while balancing profitability. Striving to deliver exceptional experiences at every touchpoint, particularly in conversational interactions, they grappled with the trade-offs between automated systems that lacked efficacy and costly human intervention.

Intent tagging and semantic analysis emerged as pivotal tools for chatbots, enabling them to discern patterns in user inputs and respond accordingly. However, as the range of use cases expanded, reliance solely on pattern matching proved insufficient.

Throughout the customer journey, the critical missing element was always context.

Enter generative AI technology. Spearheaded by the GPT revolution initiated by ChatGPT, it facilitates human-like communication between humans and machines by comprehending context. Generative systems excel at anticipating the next word within the conversational context. Yet, managing context at scale presents its own challenges.

Would a firewall secure a genAI feature?

In the simplest use case, let's consider a customer support chatbot.

Traditional chatbots often operate with fixed templates for user queries. When faced with custom inquiries, they may falter, leading to redirection to a human agent for assistance.

In contrast, a genAI-powered chatbot requires no such constraints. It can adeptly interpret user language, analyze attachments, and extract contextual cues to craft relevant responses.

Imagine an application firewall safeguarding this genAI chatbot. While a conventional WAF can filter and permit user prompts based on predefined strings and configured rules like IP tracing or rate limits, it may struggle when faced with encrypted attacks not explicitly addressed in its rules.

Moreover, most firewalls operate in a stateless manner, handling each query in isolation without retaining context from previous interactions. While some advanced firewalls extend protection to outbound traffic to prevent data breaches, accurately parsing outbound data in real-time poses significant challenges.

In essence, conventional firewalls, with their reliance on patterns, lack the contextual analysis capabilities necessary to differentiate genuine user behavior from potential threats.

Intent over Pattern: keeping genAI system safe

Claiming that generative AI understands intent just like humans would be off the mark. However, genAI shines in assessing language consistency and relevance, along with grasping the appropriateness of input content. LLMs are designed to comprehend and generate human-like text, allowing them to interpret the meaning and context of input data in a more nuanced manner.

As a result, conventional intent tagging techniques found in classical machine learning fall short when compared to LLMs' contextual grasp, especially in the dynamic environment of ongoing customer support sessions.

When it comes to safeguarding a generative AI system, solely relying on traditional pattern-based systems won't cut it.

Practical Usage of AI Firewalls

Let's now deep dive into two scenarios where traditional firewalls will not be sufficient. A shift is needed in the design of these shields that protect AI applications.

Goal 1: access information that was not intended to be displayed.

You can carry out a SQL injection aimed at manipulating the databases to access such prohibited information. Here's a conventional firewall protecting a system from an SQL injection:

A firewall blocking access after detecting a malicious SQL injection attempt, evidenced by the payload in the URL. By inspecting and filtering user input, this firewall prevents unauthorized database manipulation and information theft via SQL injection attacks.

But in order to achieve the same goal when your feature is powered by a large language model, the rules of a firewall that match for strings are not sufficient. Here's an example of a bad actor trying to jailbreak a model trying to get sensitive information.

jailbreaking LLM to reveal training data

Sometimes, because of the unpredictable nature of generative AI features, you don't necessarily need to jailbreak. E.g. Chatgpt poem thousand times snapshot.

ChatGPT reveals training data when asked to repeat "poem"

So then how do we protect a generative AI system from a bad actor trying to access information it's not supposed to get it's hands on. Enter an adaptive intent-based firewall.

Protecting against sensitive information leaks using AI firewall.

Jailbreak caught by SydeGuard, our AI firewall

A user attempts to extract sensitive data from an AI model, but rather than depending on semantic analysis, the model detects the malicious intent behind the prompt, preventing a potential data leak.

Goal 2: prevent misuse or abuse of a customer support chatbot

LLM based systems have drastically increased the helpfulness of customer support systems. Customers can interact more freely and explain their issues and in most cases, LLMs are capable of reverting with the best answers.

But there is a loophole that has plagued LLM powered customer features of late, and that is when customers use the features to go off-topic. For instance, a search bot on an e-commerce app can be technically used to solve calculus problem; a customer support bot can be used to generate an essay on “the importance of customer empathy”. Such loopholes can be costly for large enterprises deploying generative AI applications at scale.

Traditional firewalls will never pick these up as bad inputs and block them. There are not enough patterns that one can create to protect systems for getting misused. One can control the inputs through pre-configured prompts, but that sets us back a few years in gaining customer trust.

Language based firewalls however, have the capability to gauge the intent of the query or input. Here's a couple of examples where an AI firewall is generating a high threat score for an off-topic query.

Misuse blocked by our AI firewall, SydeGuard

Asking a customer support chatbot to solve trigonometry problem.

another example of off-topic usage of LLMs protected by AI firewall

Asking a chatbot to write an essay instead of resolving a support query.

Enterprises can decide not to process this query and save compute cost and tokens, both being precious assets in the genAI world.

Safeguarding LLM systems with AI firewalls

In safeguarding LLM systems, conventional firewalls fall short due to their limited scope against a vast attack surface. Take, for instance, a customer support chatbot; ensuring its security requires a dynamic runtime guard capable of scrutinizing input queries from customers and making real-time decisions to allow or block access to the underlying LLM.

This novel paradigm of security aligns with the exigencies of generative AI features. It necessitates a multifaceted platform capable of proactive threat detection and adaptive response mechanisms, akin to the intuitive actions of seasoned guards. Such a platform empowers AI engineers to preemptively address vulnerabilities, comply with regulatory mandates, and build organizational resilience against unforeseen contingencies.

For enterprises deliberating the deployment of generative AI for business applications, SydeLabs offers a lifeline. Alternatively, if you're engaged in experimental AI modeling and seek to identify vulnerabilities, consider trying out our automated AI red teaming product.

In essence, SydeLabs stands as the bastion of security, ensuring that the trajectory of generative AI innovation in your organization remains unhindered by apprehension, empowering you to explore the full potential of this transformative technology with confidence and peace of mind.

Is this the end of WAF?

No, not just yet. A guard for LLMs will not replace WAFs for its use cases. Simply put, a WAF will continue to protect deterministic systems and a guard for LLMs will protect probabilistic systems.

In Short

Securing generative AI systems will require a deep understanding of how LLMs work. It will also need an advanced understanding of language and social engineering.

“There are people trying to hack into JPMorgan Chase 45 billion times a day; even worse is that it’s already double what it was last year — and the attacks are only getting more sophisticated.” — Mary Erdoes, CEO, JP Morgan Chase

It’s getting increasingly easy and less costly to attack sensitive systems. It’s almost impossible for traditional rule-based or pattern-based systems to keep up with this massively increased risk surface area. Firewalls, may not go obsolete just yet, but will fall short of most attacks on generative AI systems. The world needs modern guards, more sophisticated, autonomous firewalls, to understand the intent of every input and decide whether it’s an attack or a legitimate request and actively learn in the process.

Sailesh Mishra

Founding Member, SydeLabs