Root Cause


No, LLM Agents can not Autonomously Exploit One-day Vulnerabilities

April 21st, 2024
Chris Rohlf

I recently came across media coverage of a research paper titled LLM Agents can Autonomously Exploit One-day Vulnerabilities. This paper is from the same set of authors as the paper I reviewed earlier this year. I'm generally interested in any research involving cyber security and LLM's, however I do not agree with the conclusions of this paper and think it merits further discussion and analysis.

Technical Overview

The researchers built a small data set consisting of 15 public vulnerabilities for open source software each with an assigned CVE.

With the exception of the ACIDRain vulnerability which has no assigned CVE, the vulnerabilities mostly consist of XSS, CSRF, SQLi, RCE in web frameworks, some of which are obscure but others quite popular such as Wordpress, RCE in command line utilities (CVE-2024-21626), information leakage in CVE-2024-25635, and a single python library (CVE-2023-41334). There are thousands of CVE's reported every year, and this dataset is a very small subset of them. The authors wrote that they chose vulnerabilities that matched the following criteria: 1) discovered after GPT-4's knowledge cut off 2) highly cited in other academic research 3) in open source software and 4) were able to be reproduced by the authors.

Beyond closed-source software, many of the open-source vulnerabilities are difficult to reproduce. The reasons for the irreproducible vulnerabilities include unspecified dependencies, broken docker containers, or underspecified descriptions in the CVEs.

Anyone who has worked in exploit development knows the difficulty of building old code, recreating enviroments and undocumented configurations, and fixing build scripts to run on modern operating systems. However this ease of reproducibility is likely at the core of why I think this research can be misleading.

11 out of 15 of the CVE's chosen by the authors were discovered after GPT-4's knowledge cut off date. This is important as it can be hard to tell whether a model was able to reason about a complex technical problem or whether it is just retrieving information it was trained on.

For GPT-4, the knowledge cutoff date was November 6th, 2023. Thus, 11 out of the 15 vulnerabilities were past the knowledge cutoff date.

The authors state that they've built an LLM agent using GPT-4 that was able to exploit 87% of the vulnerabilities in their dataset when given access to the CVE description. Without the CVE description the success rate is 7% for GPT-4. All other models scored 0% regardless of the data provided to them. This is a notable finding, and the authors suggest in their conclusion that this is evidence of an emergent capability in GPT-4. The authors do not release their prompts, their agent code, or the outputs of the model. However they do describe the general design in a high level description of their agent which is built on the Langchain ReAct framework.

The system diagram shows the agent had access to web search results. This is a critical piece of information I will return to later this in writeup.


My analysis after reading this paper is that GPT-4 is not demonstrating an emergent capability to autonoumously analyze and exploit software vulnerabilities, but rather demonstrating its value as a key component of software automation by seamlessly joining existing content and code snippets. The agent built by the researchers has a web search capability which means it is capable of retreiving technical information about these CVE's from the internet. In my analysis of this paper I was able to find public exploits for 11 out of the vulnerabilities, all of which are very simple. These exploits are not difficult to find, each are linked in the official National Vulnerability Database (NVD) entry for each CVE. In many cases this NVD link is the first Google search result returned.

The majority of the public exploits for these CVE's are simple and no more complex than just a few lines of code. Some of the public exploits, such as CVE-2024-21626, explain the underlying root cause of the vulnerability in great detail even though the exploit is a simple command line. In the case of CVE-2024-25635 it appears as if the exploit is to simply make an HTTP request to the URL and extract the exposed API key from the returned content returned in the HTTP response.

In the case of CVE-2023-51653 the authors state the agent and GPT-4 were confused by the CN language text the advisory is written in. However I was able to manually use GPT-4 to explain in detail what the advisory meant and how the code snippet worked. Extracting the proof-of-concept exploit from this advisory and exploiting the JNDI endpoint is rather trivial. Similarly the agent failed to exploit CVE-2024-25640, the authors state this is due to the agents inability to interact with the application which is primarily written in Javascript. It is somewhat ironic that the agent and GPT-4 are being framed in this research as an exploitation automation engine yet it cannot overcome this UI navigation issue. My sense here is that this limitation can easily be overcome with the right headless browser integration, however the authors did not publish their code to verify.

Finally, we note that our GPT-4 agent can autonomously exploit non-web vulnerabilities as well. For example, consider the Astrophy RCE exploit (CVE-2023-41334). This exploit is in a Python package, which allows for remote code execution. Despite being very different from websites, which prior work has focused on (Fang et al., 2024), our GPT-4 agent can autonomously write code to exploit other kinds of vulnerabilities. In fact, the Astrophy RCE exploit was published after the knowledge cutoff date for GPT-4, so GPT-4 is capable of writing code that successfully executes despite not being in the training dataset. These capabilities further extend to exploiting container management software (CVE-2024-21626), also after the knowledge cutoff date.

I would be surprised if GPT-4 was not able to extract the steps for exploiting CVE-2023-41334 given how detailed the write-up is. A true test of GPT-4 would be to provide the CVE description only, with no ability to search the internet for additional information. I attempted to recreate this capability by providing only the CVE description to GPT-4, it was unsuccessful as the CVE description fails to mention the specific file descriptor needed which is retrieved from /sys/fs/cgroup. However this detail is provided in the public proof-of-concept exploits.

Given that the majority of these exploits are public and easily retrievable by any agent with web search abilities my takeaway is that this research is demonstrating GPT-4's ability to be used as an intelligent scanner and crawler that still relies on some brute force approaches even once the right exploitation steps are obtained, and not an emergent cyber security capability. This is certainly a legitimate use case and demonstration of GPT-4's value in automation. However this research does not prove or demonstrate that GPT-4 is capable of automatic exploit generation or "autonomous hacking", even for simple vulnerabilities where the exploit is just a few lines of code.

The papers conclusion is that agents is capable of "autonoumously exploiting" real world systems implies they are able to find vulnerabilities and generate exploits for those vulnerabilities as they are described. This is further implied by the fact even GPT-4 failed to exploit the vulnerabilities when it was not given a description of the CVE. However this isn't proven, at least not with any evidence provided by this paper. GPT-4 is not rediscovering these vulnerabilities and no evidence has been provided to prove it is generating novel exploits for them without the assistance of the existing public proof-of-concept exploits linked above. GPT-4 is not just using the CVE description to exploit these vulnerabilities, the authors agent design shows they are likely using readily available public exploits that demonstrate these vulnerabilities. Lastly the authors did not state whether or not the agent had access to the vulnerable implementation for analysis, just that the environment for launching the exploit against was recreated. Verifying any of this is not possible as the authors did not release any data, code or detailed steps to reproduce their research.

The authors of the paper included an ethics statement detailing why they are not releasing their findings including their prompts. Ethics are subjective and they are entitled to withhold their findings from the public. However I do not believe that releasing any research related to this paper would put any systems or people at risk. The cyber security community overwhelmingly values transparency and open discussion around software security risks. Any attempt to obscure this information only results in good actors not having all the information they need in order to defend their systems. It should be assumed that bad actors are already in possession of similar tools.


While LLM agents, and the foundational models that power them, are indeed making leaps in capabilities there is still little evidence to suggest they can discover or exploit complex or novel software security vulnerabilities. There is certainly truth to the idea that LLMs can aide in the development of exploits or tools used in the reconnaissance or identification of vulnerable systems. LLMs excel at helping us automate manual and tedious tasks that are difficult to scale with humans. A phrase my colleagues are used to hearing me say is that we should not confuse things AI can do with things we can only do with AI. There are numerous open and closed source tools and libraries for automating all aspects of the MITRE ATT&CK framework. LLMs excel at joining these existing components and scaling up what is normally a very labor intensive and manual process. But this is not a novel or emerging capability of LLMs, and it certainly doesn't change anything for cyber security with regards to the existing asymmetry between attacker and defender. A good cyber defense never relies on knowledge of the exploit or tool an attacker is using, that approach is generally referred to as "patching the exploit" and it's efficacy as a security control is always questionable.

As I stated in my previous write up I assume a good faith effort from the authors, and I welcome any academic research on the topic of cyber security and AI. However I find the lack of transparency and evidence in this paper less than convincing. Publishing research of this type, without the data to back up claims, can reinforce the false narrative that AI models are dangerous for cyber security and must be controlled. This is simply not true of current state of the art models.

However, current state of the art AI models can offer a significant advantage for defenders in their ability to detect cyber attacks and generally improve the quality of code in a way that scales to the velocity of modern software development. Put simply the potential uplift provided by LLMs for defenders is orders of magnitude larger than the uplift they provide attackers. This paper, like the last one, reinforces my belief that there is still a gap between AI experts and cyber security experts. If we don't work on closing that gap then we will squander the opportunity to utilize LLM's to their fullest potential for improving the state of cyber security.