Blog

Securing the Future: A Guide to AI Security with MITRE

Tuesday, March 26, 2024

8 min read

Mukunth M

Security Researcher, SydeLabs

Growing AI use cases, Rising Security Concerns

Imagine a self-driving car making a critical decision based on manipulated data. That's the potential danger of AI security vulnerabilities. In today’s digital age, AI has become an essential part of various industries,revolutionizing workflows and driving innovation.We are experiencing a wave of rapid AI adoption across a broad spectrum of applications, from transforming healthcare diagnostics to optimizing logistics in the transportation sector.

However with the rise in the adoption of AI comes the urgent necessity to implement robust security measures for protection of sensitive data and ensuring the integrity of AI systems. As AI continues to transform our world, prioritizing security solutions is paramount to safeguard both organizations and individuals. The complexity of these systems, coupled with the rapid pace of development, makes it challenging to identify and mitigate security threats effectively.

MITRE is a renowned organization which conducts cutting-edge research and develops innovative solutions to strengthen cybersecurity practices and technologies.They have been instrumental in the development of standards and frameworks.

Understanding the growing need to safeguard AI from evolving threats, MITRE have developed the Adversarial Tactics, Techniques, and Common Knowledge for AI (ATLAS) framework to help organizations tackle security challenges. The ATLAS framework serves as a comprehensive knowledge base, containing adversary tactics and techniques observed in real-world attacks.

MITRE ATLAS: A Breakdown of Attack Categories

The MITRE ATLAS framework provides a comprehensive list of various adversarial attacks, but on a higher level the attacks can be categorized into:

Poisoning Attack - Manipulating training data by injecting biased or corrupted information to influence model outputs.
Evasion attack - Leveraging a form of digital camouflage by masking malicious inputs to deceive defenses and bypass security measures.
Functional Extraction - Iteratively querying the model to extract its core functionality, and using that to build a replica of the model.
Inversion attack - Extracting the training data of the model potentially exposing sensitive information
Prompt injection attacks - Carefully crafted malicious prompts that make the model act in unintended ways.
Traditional Cyber attack - Targeting the underlying infrastructure to disrupt AI system operations.

Deep Dive into MITRE ATLAS: Exploring Strategies

Let's shift gears and explore the “why” of the attack, understanding the motive behind an adversary attack. The Atlas framework acts as a roadmap, briefly explaining different tactics attackers use to achieve their goals.

Types of attacks covered in MITRE ATLAS framework

Reconnaissance

Similar to a thief casing a house before a burglary, attackers often try this strategy which involves gathering intel about the organization’s machine learning setup. This includes details like the environment,models used and their functionalities.They gather information by actively searching for organization’s publicly available research material,vulnerability analysis reports and application repositories.This information allows them to develop more tailored attacks targeting the system’s weaknesses.

For instance,researchers at Brown university were able to replicate OpenAI’s GPT-2 model by collecting information about the model architecture,datasets and the training hyperparameters from multiple research papers and the publicly available documentation and the resulting model performed similar to the original GPT-2 model on most datasets.

Resource Development

Just like any good heist requires tools, attackers need resources to pull off successful attacks on AI systems. This strategy involves gathering information through a mixture of legal and illegal means to build an information base which will then be used to deploy targeted attacks.This may involve attempts to steal legitimate user accounts or acquiring information about the model infrastructure which will equip the attacker with the information they need to carry out other types of attacks.

Initial Access

This approach focuses on exploring methods to gain access to the system.For example, attackers could gain access by compromising portions of the ML Supply chain which may include GPU Hardware or parts of the ML software stack.Once they gain access to the system, attackers can launch attacks,steal data or disrupt the operations of the AI system.

Back in December 2022,a sneaky attack targeted the users of popular machine learning library PyTorch by inserting a malicious binary code to the Python Package Index (PyPI) code repository with the same name as the pytorch dependency which led to users installing the malicious package instead of the legitimate one.This is a type of supply chain attack called dependency confusion.This attack was used to extract sensitive information on the devices which installed the malicious package.

Compromised PyTorch Dependency

ML Model Access

This phase revolves around gaining some level of access to the ML model. There are varying levels of access from a full transparency into the model’s inner workings to gaining a foothold in the physical location where the training data is gathered. Attackers may employ different techniques like gaining API access to a model or exploit vulnerabilities in ML enabled products and services.

Execution

This tactic involves the deployment of malicious code on the local or remote system. These malicious code snippets may be embedded in ML artifacts or software, allowing attackers to gain access to the victim's system. Once access is obtained, attackers can attempt to steal the victim's data and exploit vulnerabilities in their network.

ChatGPT Plugin Data Leak

Recently, a security flaw was discovered in ChatGPT where the attacker was able to feed a malicious website through ChatGPT plugins to take control of the chat session and extract the entire chat history.Indirect prompt injection attacks like these can lead to exposure of Personal Identifiable Information (PII).

Persistence

Imagine a student who cleverly hides a cheat sheet in their pencil case to keep using it during exams.That is exactly what an attacker does using a persistence strategy.They ensure continued access to the system even if the victim restarts the system or changes his passwords.Attackers achieve persistence through backdoor attacks, which involve poisoning the training data and using triggers to access the system as needed.

Privilege Escalation

This approach encompasses techniques where attackers attempt to obtain higher-level permissions on a system. Initially, attackers may have limited access to the system, but they will identify weaknesses and exploit them to gain higher-level access, allowing them to take control of the system and execute malicious tasks.

Defense Evasion

Similar to a burglar evading security cameras, attackers employing Defense Evasion utilize a range of techniques to avoid detection by machine learning-based security systems. Attackers may craft adversarial prompts aimed at misleading the model, preventing it from accurately identifying the true intent of the content.

Credential Access

Adversaries utilize this method to acquire user credentials, including usernames and passwords. This involves finding insecurely stored credentials.Attackers search for insecurely stored credentials in locations such as environment variables, bash history, or application repositories, and then misuse them for their own purposes.

ML Attack Staging

In this phase, attackers leverage their knowledge of the victim's system to craft custom attacks targeting its weaknesses. This may involve techniques like training proxy models, poisoning the target model, and crafting adversarial data to feed the target model.

Exfiltration

This approach consists of techniques which are focussed on stealing information from the victim’s system or network. This information encompasses a range of sensitive data such as intellectual property, personal information, or machine learning artifacts. The data is then extracted out of the system using secret channels and might also include a size limit for the data transmission to avoid being noticed.

Google Drive Access Request

Recently, a vulnerability was discovered in Google Colab, a platform extensively utilized for machine learning and data science projects. Sharing notebooks via links is a common practice,but it comes with its risks.Google Colab seamlessly integrates with Google Drive, prompting users to authorize access to their respective drives upon opening a shared notebook. However, the execution of malicious code embedded within these shared notebooks could potentially result in data theft.

Impact

This is all about the real world consequences the attacker aim to achieve which may involve manipulating,interrupting or destroying the victim’s AI system.The attacker may also try to undermine the confidence in the victim’s AI system leading to questioning of the system’s reliability.

The ATLAS Advantage: Building AI You Can Trust

The framework provides organizations with a potent tool to mitigate risks and ensure the responsible and secure development and deployment of Generative AI products. By offering a comprehensive knowledge base of adversary attacks and tactics, it transforms the way organizations tackle GenAI security, enabling a deeper understanding of the underlying mechanisms behind potential threats.

Integrating this framework into the AI development lifecycle empowers organizations to proactively identify attack vectors and establish robust security before they are exploited. Additionally we can leverage the knowledge base to create targeted test cases that can simulate real world attacks.

SydeLabs coverage of MITRE Framework for GenAI security

The MITRE ATLAS framework acts as a blueprint for our AI security solutions, allowing us to continuously refine and improve detection capabilities. It allows us to map our detection methods to these specific attacks, ensuring we actively identify the most relevant and dangerous attempts to bypass security.This approach allows us to stay ahead of the curve and develop highly effective detection mechanisms.

| GenAI threats | How SydeLabs covers MITRE framework for GenAI | | --- | --- | | Expoliting Public facing applications | Sydeguard acts as a security shield, filtering user input in real-time preventing attackers from exploiting your public facing aplications. | | Large Language Model access | SydeBox, our red teaming solution, helps organizations understand vulnerabilities in ML-enabled products/services, safeguarding them against attacks targeting the underlying model. | | Persistence | Prompt injection, a popular technique diverts LLM from its original instructions. Our firewall solution detects and blocks prompt injections, securing your system from these threats. | | Privilege Escalation | Our AI guard effectively blocks jailbreak attempts, preventing unauthorized access and misuse of your AI system | | Evasion | Evasion methods can confuse LLMs, but rest assured, our firewall solution identifies various evasion techniques and thwarts them effectively. | | System prompt extraction | Through our AI red teaming solution, we simulate real world attacks to identify model vulnerabilities, effectively eliminating any potential for system prompt leakage. | | Adverserial Data | Adversarial data injected into the LLM may result in misclassification, but our vigilant AI guard swiftly detects prompts with harmful intent. |

Conclusion

The dance between innovation and security in Artificial Intelligence is a never-ending symphony. As AI pushes boundaries, security measures must advance, fostering a symbiotic connection between progress and protection. The MITRE ATLAS framework allows organizations to get a better understanding of the threat landscape and build better defenses to safeguard their AI systems. It empowers us to not only identify the threats we face today, but also to anticipate how attacks might evolve in future allowing us to be one step ahead.

Mukunth M

Security Researcher, SydeLabs