We are all now well aware of the various harms caused by AI: rising electricity prices, shortages of municipal water supplies and negative effects on the environment. On top of all this, AI has also become a big cybersecurity risk, with Google revealing how AI has recently become a hacker super tool. We saw an example of this when Anthropic revealed Claude Mythos in April 2026, highlighting that it had already found thousands of vulnerabilities in popular browsers like Chrome and Firefox and entire operating systems like Windows and Linux.
We now have a real-life example from OALABS Research detailing the efforts of an amateur “hacker” – in quotes because the AI agents carried out all the hacking without much user intervention – maliciously using Claude and the Codex to commit cybercrimes. In this case, the hacker took over servers belonging to others and copied his own instance of Claude to run on those servers. An owner of one of these compromised servers contacted OALABS, which revealed the author’s full prompt history.
The person behind this was a young Ethiopian man – details that only came to light because he asked the same agent Claude to change his CV (which contained his full name and location) before going on a hacking spree. This shows how little experience the hacker had, which is further supported by his prompts consisting of vague instructions like “recognize this” and filled with typos and grammatical errors. Yet despite not being an expert and Claude providing all the code, the hacker took control of various personal servers, accessed the data of at least 14 companies, and even attempted to steal $4 million in cryptocurrency, although the latter was a failure.
How did the hacker bypass Claude’s protections?
Anthropic, the company behind Claude, is well aware of the risks associated with advanced programming AI agents. Referring to Claude Fable, a version of its Mythos model with certain guarantees, Anthropic says: “Releasing such a high-performance model carries risks. Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused and cause serious harm. » He then adds that there are protective measures which instead redirect the request to Claude Opus in order to guarantee that no harm is caused. However, all of the hacker’s exploits have already been accomplished with Claude Opus, and not with any of Anthropic’s top models.
Opus has its own safeguards in place that prevent it from violating copyrights or accepting malicious prompts. The hacker removed these protections with relative ease. He did so by claiming he was part of a red team tasked with researching cybersecurity vulnerabilities. This worked so well that the AI agent went as far as to estimate the monetary gain the hacker could obtain by targeting these companies. Claude explained in particular how to reap the benefits: sale of confidential data, extortion and direct theft.
There was only one case where the attacker failed to circumvent safeguards despite their efforts. They were attempting to steal data from the digital accounts of an individual and their family. Claude flagged this as a request he could not agree to despite the hacker’s red team exploit, because authorized red team exercises do not target specific people.
Is there a solution to the misuse of AI in cybersecurity concerns?
The AI agent used by the hacker is publicly available, and it is nowhere near as powerful as the Claude Mythos agent that some tech companies have. Given how quickly AI is evolving, public access to it may lead to similar cases around the world. As we’ve already seen, everything the author did here required almost no prior knowledge of hacking: OALABS states that he may have even used another AI agent to write some prompts for Claude. This means that nothing stops another person from replicating the same results.
There are safeguards built into agents to prevent this, but as this case shows, they are fairly easy to overcome. The problem here is that there is no real way to distinguish between legitimate cybercrime researchers who use AI ethically and malicious actors who use it for exploitation and personal gain.
Limiting the model to protect against this type of prompt would mean taking the benefit of AI away from the very researchers responsible for strengthening cybersecurity. On the other hand, leaving things as they are could have catastrophic consequences. There has to be a way to effectively implement protective measures that explicitly target malicious intent, but because that’s a distinction that even humans have trouble making, AI giants like OpenAI and Anthropic are largely at a loss as to what to do.
