AI Chatbots Highly Vulnerable to Jailbreaks, UK Researchers Find
Four of the most used generative AI chatbots are highly vulnerable to basic jailbreak attempts, researchers from the UK AI Safety Institute (AISI) found.
In a May 2024 update published ahead of the AI Seoul Summit 2024, co-hosted by the UK and South Korea on 21-22 May, the UK AISI shared the results of a series of tests performed on five leading AI chatbots.
The five generative AI models are anonymized in the report. They are referred to as the Red, Purple, Green, Blue and Yellow models.
The UK AISI performed a series of tests to assess cyber risks associated with these models.
These included:
- Tests to assess whether they are vulnerable to jailbreaks, actions designed to bypass safety measures and get the model to do things it is not supposed to
- Tests to assess whether they could be used to facilitate cyber-attacks
- Tests to assess whether they are capable of autonomously taking sequences of actions (operating as “agents”) in ways that might be difficult for humans to control
The AISI researchers also tested the models to estimate whether they could provide expert-level knowledge in chemistry and biology that could be used for positive and harmful purposes.
Bypassing LLM Safeguards in 90%-100% of Cases
The UK AISI tested four of the five large language models (LLMs) against jailbreak attacks.
All proved to be highly vulnerable to basic jailbreak techniques, with the models actioning harmful responses in between 90% and 100% of cases when the researchers performed the same attack patterns five times in a row.
The researchers tested the LLMs using two types of question sets, one based on HarmBench Standard Behaviors, a publicly available benchmark, and the other developed in-house.
To grade compliance, they used an automated grader model based on a previous scientific paper combined with human expert grading.
They also compared the results to LLM outputs when asked sets of benign and harmful questions without using attack patterns.
The researchers concluded that all four models comply with harmful questions across multiple datasets under relatively simple attacks, even if they are less likely to do so in the absence of an attack.
LLMs are Limited Tools for Cyber-Attackers
Other tests shared in the UK AISI May 2024 update showed that four publicly available can solve simple capture the flag (CTF) challenges, of the sort aimed at high school students.
However, they all struggled with more complex problems, such as university-level cybersecurity challenges.
Finally, the UK AISI researchers showed that two models could autonomously solve some short-horizon tasks, such as software engineering problems, but none is currently able to plan and execute sequences of actions for more complex tasks.
These findings suggest that LLMs are likely not significantly helpful tools for cyber-attackers.
Looking for in-depth cybersecurity tips with analysis of the latest threats and scams? Subscribe to our premium newsletter for the information you need to stay secure. Plus get a free copy of our popular "Cybersecurity Starter Guide" to enable you to discover how to keep your systems secure.
Headlines
News
- CISA Warns of Actively Exploited NextGen Mirth Connect Pre-Auth RCE Vulnerability
The CISA has required federal agencies to update to a patched version of Mirth Connect (version 4.4.1 or later) by June 10, 2024, to secure their networks against active threats.
- Consumers Continue to Overestimate Their Ability to Spot Deepfakes
The Jumio 2024 Online Identity Study reveals that while consumers are increasingly concerned about the risks posed by deepfakes and generative AI, they continue to overestimate their ability to detect these deceptions.
- Cybercriminals Shift Tactics to Pressure More Victims Into Paying Ransoms
Cybercriminals' new tactics led to a 64% increase in ransomware claims in 2023, driven by a 415% rise in "indirect" incidents and remote access vulnerabilities, pressuring more victims to pay ransoms, according to At-Bay.
- Too Many ICS Assets are Exposed to the Public Internet
The enterprise attack surface is rapidly expanding due to the convergence of IT and OT systems, leading to a large number of ICS assets being exposed to the public internet and creating new vulnerabilities that security teams struggle to manage.
- Kinsing Hacker Group Expands its Cryptoming Botnet Network with More Vulnerability Exploits
The Kinsing hacker group has demonstrated its ability to continuously evolve and adapt, quickly integrating newly disclosed vulnerabilities into its exploit arsenal to expand its cryptojacking botnet across various operating systems and platforms.
- Are All Linux Vendor Kernels Insecure? A New Study Says Yes, but There’s a Fix
A study by CIQ found that Linux vendor kernels, such as those used in Red Hat Enterprise Linux (RHEL), have significant security vulnerabilities due to the backporting process used to maintain stability.
- UK Government Publishes AI Cybersecurity Guidance
The UK government has released guidance to help AI developers and vendors protect their AI models from hacking and potential sabotage, with the goal of transforming this guidance into a global standard to promote security by design in AI systems.
Are you looking to keep ahead of security threats? Subscribe to our premium monthly newsletter for in-depth cybersecurity tips and analysis of the latest threats and scams. New subscribers get a free copy of our Cybersecurity Starter Guide.
Were you forwarded this email? Sign up here to receive this email weekly in your inbox.
|