No, LLM Agents can not Autonomously Exploit One-day Vulnerabilities
April 21st, 2024
Chris Rohlf
I recently came across media coverage of a research paper titled LLM Agents can Autonomously Exploit One-day Vulnerabilities. This paper is from the same set of authors as the paper
I reviewed earlier this year. I'm generally interested
in any research involving cyber security and LLM's, however I do not agree with the conclusions of this paper
and think it merits further discussion and analysis.
Technical Overview
The researchers built a small data set consisting of 15 public vulnerabilities for open source software each
with an assigned CVE.
With the exception of the ACIDRain vulnerability which has no assigned CVE, the vulnerabilities mostly consist of XSS, CSRF, SQLi, RCE
in web frameworks, some of which are obscure but others quite popular such as Wordpress, RCE
in command line utilities (CVE-2024-21626), information leakage in CVE-2024-25635, and a single python library (CVE-2023-41334).
There are thousands of CVE's reported every year, and this dataset
is a very small subset of them. The authors wrote that they chose vulnerabilities that matched the following criteria:
1) discovered after GPT-4's knowledge cut off 2) highly cited in other academic research 3) in open source software
and 4) were able to be reproduced by the authors.
Beyond closed-source software, many of the open-source vulnerabilities are difficult to reproduce. The reasons
for the irreproducible vulnerabilities include unspecified dependencies, broken docker containers, or underspecified
descriptions in the CVEs.
Anyone who has worked in exploit development knows the difficulty of building old code, recreating enviroments
and undocumented configurations, and fixing build scripts to run on modern operating systems. However this ease
of reproducibility is likely at the core of why I think this research can be misleading.
11 out of 15 of the CVE's chosen by the authors were discovered after GPT-4's knowledge cut off date. This
is important as it can be hard to tell whether a model was able to reason about a complex technical problem
or whether it is just retrieving information it was trained on.
For GPT-4, the knowledge cutoff date was November 6th, 2023. Thus, 11 out of the 15
vulnerabilities were past the knowledge cutoff date.
The authors state that they've built an LLM agent using GPT-4 that was able to exploit 87% of the vulnerabilities in
their dataset when given access to the CVE description. Without the CVE description the success rate is 7% for GPT-4.
All other models scored 0% regardless of the data provided to them. This is a notable finding, and the authors suggest
in their conclusion that this is evidence of an emergent capability in GPT-4.
The authors do not release their prompts, their agent code, or the outputs of the model. However they do describe
the general design in a high level description of their agent which is built on the Langchain ReAct framework.
The system diagram shows the agent had access to web search results. This is a critical piece of information I will
return to later this in writeup.
Analysis
My analysis after reading this paper is that GPT-4 is not demonstrating an emergent capability to autonoumously analyze
and exploit software vulnerabilities, but rather demonstrating its value as a key component of software automation by seamlessly joining
existing content and code snippets. The agent built by the researchers has a web search capability which means
it is capable of retreiving technical information about these CVE's from the internet. In my analysis of this paper I was
able to find public exploits for 11 out of the vulnerabilities, all of which are very simple. These exploits are not difficult to find, each are linked
in the official National Vulnerability Database (NVD) entry for each CVE. In many cases this NVD link is the first
Google search result returned.
The majority of the public exploits for these CVE's are simple and no more complex than just a few lines of code. Some of the public exploits, such as CVE-2024-21626,
explain the underlying root cause of the vulnerability in great detail even though the exploit is a simple command line. In
the case of CVE-2024-25635 it appears as if the exploit is to simply make an HTTP request to the URL and extract the exposed API key from the returned
content returned in the HTTP response.
In the case of CVE-2023-51653 the authors state the agent and GPT-4 were confused by the CN language text the advisory is written in.
However I was able to manually use GPT-4 to explain in detail what the advisory meant and how the code snippet worked. Extracting
the proof-of-concept exploit from this advisory and exploiting the JNDI endpoint is rather trivial. Similarly the agent failed to exploit CVE-2024-25640, the
authors state this is due to the agents inability to interact with the application which is primarily written in Javascript. It is
somewhat ironic that the agent and GPT-4 are being framed in this research as an exploitation automation engine yet it cannot overcome
this UI navigation issue. My sense here is that this limitation can easily be overcome with the right
headless browser integration, however the authors did not publish their code to verify.
Finally, we note that our GPT-4 agent can autonomously exploit non-web vulnerabilities as
well. For example, consider the Astrophy RCE exploit (CVE-2023-41334). This exploit is in
a Python package, which allows for remote code execution. Despite being very different
from websites, which prior work has focused on (Fang et al., 2024), our GPT-4 agent can
autonomously write code to exploit other kinds of vulnerabilities. In fact, the Astrophy
RCE exploit was published after the knowledge cutoff date for GPT-4, so GPT-4 is capable
of writing code that successfully executes despite not being in the training dataset. These
capabilities further extend to exploiting container management software (CVE-2024-21626),
also after the knowledge cutoff date.
I would be surprised if GPT-4 was not able to extract the steps for exploiting CVE-2023-41334 given
how detailed the write-up is. A true test of GPT-4 would be to provide the CVE description only, with
no ability to search the internet for additional information. I attempted to recreate this capability
by providing only the CVE description to GPT-4, it was unsuccessful as the CVE description fails to mention
the specific file descriptor needed which is retrieved from /sys/fs/cgroup. However this detail is
provided in the public proof-of-concept exploits.
Given that the majority of these exploits are public and easily retrievable by any agent with web search abilities my takeaway is that this
research is demonstrating GPT-4's ability to be used as an intelligent scanner and crawler that still relies on some brute force approaches even
once the right exploitation steps are obtained, and not an emergent cyber security capability. This is certainly a legitimate
use case and demonstration of GPT-4's value in automation. However this research does not prove or demonstrate that GPT-4 is
capable of automatic exploit generation or "autonomous hacking", even for simple vulnerabilities where the exploit is just a few
lines of code.
The papers conclusion is that agents is capable of "autonoumously exploiting" real world systems implies they are able to
find vulnerabilities and generate exploits for those vulnerabilities as they are described. This is further implied by the fact even GPT-4 failed to exploit the vulnerabilities
when it was not given a description of the CVE. However this isn't proven, at least not with any evidence provided by this paper. GPT-4 is not
rediscovering these vulnerabilities and no evidence has been provided to prove it is generating novel exploits for them without the assistance
of the existing public proof-of-concept exploits linked above. GPT-4 is not
just using the CVE description to exploit these vulnerabilities, the authors agent design shows they are likely using readily
available public exploits that demonstrate these vulnerabilities. Lastly the authors did not state whether or not the agent had
access to the vulnerable implementation for analysis, just that the environment for launching the exploit against was recreated.
Verifying any of this is not possible as the authors did not release any data, code or detailed steps to reproduce their research.
The authors of the paper included an ethics statement detailing why they are not releasing their findings including their
prompts. Ethics are subjective and they are entitled to withhold their findings from the public. However I do not believe that
releasing any research related to this paper would put any systems or people at risk. The cyber security community overwhelmingly
values transparency and open discussion around software security risks. Any attempt to obscure this information only results
in good actors not having all the information they need in order to defend their systems. It should be assumed that
bad actors are already in possession of similar tools.
Conclusion
While LLM agents, and the foundational models that power them, are
indeed making leaps in capabilities there is still little evidence to suggest they can discover or exploit
complex or novel software security vulnerabilities. There is certainly truth to the idea that LLMs can aide in the
development of exploits or tools used in the reconnaissance or identification of vulnerable systems. LLMs excel
at helping us automate manual and tedious tasks that are difficult to scale with humans. A phrase my colleagues are used to hearing me say is that we should
not confuse things AI can do with things we can only do with AI. There are numerous open and closed source tools and
libraries for automating all aspects of the MITRE ATT&CK framework. LLMs excel at joining these existing components and scaling up
what is normally a very labor intensive and manual process. But this is not a novel
or emerging capability of LLMs, and it certainly doesn't change anything for cyber security with regards to the
existing asymmetry between attacker and defender. A good cyber defense never relies on knowledge of the exploit or
tool an attacker is using, that approach is generally referred to as "patching the exploit" and it's efficacy as a security control is
always questionable.
As I stated in my previous write up I assume a good faith effort from the authors, and I welcome any academic research
on the topic of cyber security and AI. However I find the lack of transparency and evidence in this paper less than convincing.
Publishing research of this type, without the data to back up claims, can reinforce the false narrative that AI models are
dangerous for cyber security and must be controlled. This is simply not true of current state of the art models.
However, current state of the art AI models can offer a significant advantage for defenders in their ability to detect cyber
attacks and generally improve the quality of code in a way that scales to the velocity of modern software development. Put simply
the potential uplift provided by LLMs for defenders is orders of magnitude larger than the uplift they provide attackers.
This paper, like the last one, reinforces my belief that there is still a gap between AI experts and cyber security experts.
If we don't work on closing that gap then we will squander the opportunity to utilize LLM's to their fullest potential for improving
the state of cyber security.