Three security researchers have disclosed how two new threat modes can flip Generative AI model behavior from serving your GenAI applications to attacking them. While not being as dangerous as the fictional Skynet scenario from the Terminator movie franchise, the PromptWare and Advanced PromptWare attacks demonstrated do provide a glimpse into the “substantial harm” that a jailbroken AI system can cause. From forcing an app into causing a denial of service attack to using app AI to change prices in an e-commerce database, the threats are not only very real but are also likely to be used by malicious actors unless the potential harms of jailbreaking GenAI models are taken more seriously.

Introducing The PromptWare GenAI Threat

In a study titled “A Jailbroken GenAI Model Can Cause Substantial Harm: GenAI-powered Applications are Vulnerable to PromptWares,” a collaboration between Technion – Israel Institute of Technology, Cornell Tech, and Intuit, researchers argue that while a jailbroken GenAI model itself may not pose a significant threat to users of conversational AI, it can cause substantial harm to GenAI-powered applications. The new threats can force such apps to perform malicious activities beyond just providing misinformation and returning offensive content.

The researchers, Stav Cohen a PhD student at Technion – Israel Institute of Technology, Ron Bitton, principal AI security researcher at Intuit and Ben Nassi, BlackHat board member, said they are publishing the research to help “change the perception regarding jailbreaking,” and demonstrate the “real harm to GenAI-powered applications” a jailbroken GenAI model can pose.

It’s easy to see why many security professionals don’t take these kind of threats to GenAI seriously, using prompts to get a chatbot to insult the user is hardly the crime of the century. Any information that a jailbroken chatbot could be prompted to provide is going to be available on the web itself, or the dark web in some cases. So, why should anyone consider such jailbreaking threats as dangerous? “Because GenAI engine outputs are used to determine the flow of GenAI-powered applications,” the researchers explain, which means a jailbroken GenAI model “can change the execution flow of the application and trigger malicious activity.”

What Is PromptWare?

The researchers refer to PromptWare as a zero-click malware attack as it doesn’t require a threat actor to have compromised the GenAI application prior to executing the attack itself.

Think of PromptWares as being user inputs that consist of a jailbreaking command that forces the GenAI engine itself to follow the commands that the attacker will issue, plus additional commands that are created so as to trigger a malicious activity.

The malicious activity itself is achieved by forcing the GenAI to return the needed output to orchestrate the malicious activity within the application context. Within this context of a GenAI-powered app, the jailbroken engine is turned against the application itself and allows attackers to determine the execution flow. The outcome will, of course, be dependent on the permissions, context, implementation and architecture of the app itself.

GenAI engines do have guardrails and safeguards, such as input and output filtering, designed to prevent such misuse of the model, but researchers have determined numerous techniques that enable jailbreaking nonetheless.

In order to demonstrate how attackers can exploit a dedicated user input, based on knowledge of the logic used by the GenAI app, to force a malicious outcome, the researchers revealed PromptWare being used to perform a denial of service attack against a plan and execute-based application. “We show that attackers can provide simple user input to a GenAI-powered application that forces the execution of the application to enter an infinite loop, which triggers infinite API calls to the GenAI engine (which wastes resources such as money on unnecessary API calls and computational resources) and prevents the application from reaching a final state,” they wrote.

The steps involved to execute such a DoS attack are:

  • The attacker sends an email to the user by way of the GenAI assistant.
  • The GenAI app responds by querying the GenAI engine for a plan and sends this as a draft reply.
  • The app executes the task to find a suitable time to schedule the requested meeting by querying the user’s calendar API.
  • The app executes the task using the GenAI engine.
  • The app executes the EmailChecker task and determines it to be unsafe.
  • The app executes a task to rephrase the text.
  • The app executes the EmailChecker task and determines it to be unsafe.
  • An infinite loop is created and hence a DoS has been executed.

What Is The Advanced PromptWare Threat?

A much more sophisticated version of the basic PromptWare attack can also be executed and has been called an Advanced PromptWare Threat by the researchers.

An APwT attack can be used even when the logic of the target GenAI app is unknown to the threat actor. The researchers show how an attacker can use an adversarial self-replicating prompt able to autonomously determine and execute malicious activity based on a real-time process to understand the context of the app itself, the assets involved and the damage that can be inflicted.

In essence, the APwT attack uses the GenAI engine’s own capabilities to launch a kill chain in ‘inference time’ using a six-step process:

  1. Privilege Escalation – a self-replicating prompt jailbreaks theGenAI engine to ensure that the inference of the GenAI engine bypasses the GenAI engine’s guardrails.
  2. Reconnaissance A – a self-replicating prompt queries the GenAI engine regarding the context of the application.
  3. Reconnaissance B – a self-replicating prompt queries the GenAI engine regarding the assets of the application.
  4. Reasoning Damage – a self-replicating prompt instructs the GenAI engine to use the information it obtained in the reconnaissance to reason the possible damages that could be done.
  5. Deciding Damage – a self-replicating prompt instructs the GenAI engine to use the information to decide the malicious activity from different alternatives.
  6. Execution – a self-replicating prompt instructs the GenAI to perform the malicious activity.

The example shown by the researchers demonstrates how an attacker, without any prior knowledge of the GenAI engine logic, could launch a kill chain that triggers the modification of SQL tables so as to potentially change the pricing of items being sold to the user via a GenAI-powered shopping app.

AI Developers And Security Experts Respond To The PromptWare Research

I reached out to both Google and OpenAI for a statement regarding the PromptWare research. Google had not responded before publication, however, an OpenAI spokesperson said: “We are always improving the safeguards built into our models against adversarial attacks like jailbreaks. We thank the researchers for sharing their findings and will continue to regularly update our models based on feedback. We remain committed to ensuring people can benefit from safe AI.”

Erez Yalon, head of security research at Checkmarx, said that “Large Language Models and GenAI assistants are the latest building blocks in the modern software supply chain, and like open-source packages, containers, and other components, we need to treat them with a healthy amount of caution. We see a rising trend of malicious actors attempting to attack software supply chains via different components, including biased, infected, and poisoned LLMs. If jailbroken GenAI implementation can become an attack vector, there is no doubt it will become part of many attackers’ arsenal.”

The researchers have published a video on YouTube that demonstrates the PromptWare threat and a FAQ can be found on the PromptWare explainer site.

Share.
Exit mobile version