Automated Hacking Agents

When I first came across AutoGPT at the end of 2023 the very first application I thought of was automated hacking and red-teaming. I quickly realised that the framework would very quickly get confused when running it with the ChatGPT 3.5 Turbo api, and was completely unusable with Llama 1.

Image text: The current state of AutoGPT, still being regularly updated.

Then came Autogen, the open-source agent-chat framework from Microsoft. This one was a real crowd pleaser in demos, the ability to name agents, give them a system prompt and access to code execution and shell tools was terrifying to look at, at the time. I ended up making a tkinter front end called AutogenGUI with varying numbers of agents to allow people without coding experience to try out Autogen.

Image text: AutogenGUI project which I demoed a number of times

Whilst these tools were very visually impressive and cutting edge, they got lost very quickly when dealing with penetration testing tasks, especially with local models, which generally had much smaller context windows and less capability all round.

Generally I had been focused on autonomous tools that required very little oversight and were not information tools primarily but executive tools. Something that you could point at a target and then come back after watching a movie to find a nicely formatted pen-test report with accompanying evidence. For a time PentestGPT was the most well polished product on the market aimed at security testing. It's more of a guide you interact with than an automated agent, though with the premium subscription you get access to things like the terminal which can execute commands for you. Arguably this is the ideal model for something so sensitive as AI-assisted penetration testing as cases like Vibe coding service Replit deleted user's production database, faked data, told fibs galore make clear.

Image text: PentestGPT's slick UI

Enter Claude-Code, a highly competent framework attached to a highly competent model that I have no doubt is capable of performing a great number of penetration testing tasks, especially as it does things like casually uses directory traversal when it doesn't have access to a folder it wants to use.

Image text: Claude-code using directory traversal to access the `/tmp` directory when it was restricted to the project folder.

However, of all the models Anthropic has deviated the least from their original safety assurances, taking 2 years to even give Claude access to the internet in chat sessions, whereas other providers did this on much shorter timelines.

Where are we now?

CAI (Cybersecurity AI) is a framework compatible with open models from the same organisation that created PentestGPT Alias Robotics, that is currently effectively competing with human BugBounty experts and HackTheBox CTF labs.

Image text: CAI with its many badges of honour.

When you run it you're greeted with a very pretty terminal UI.

Simply install with pip install cai-framework and immediately ready to use after setting a few variables in the .env file

.venv ❯ cat .env  
LITELLM_PROXY_URL="http://localhost:11434/v1"  
LITELLM_PROXY_API_KEY="local-ollama"  
OPENAI_API_KEY="sk-12344556"  
OPENAI_BASE_URL="http://localhost:11434/v1"  
CAI_MODEL="ollama/qwen3:8b"  
CAI_TELEMETRY=False

Not that I'd recommend ever putting your .env in an internet article, here is mine with no redactions.

CAI iterates through scans, first finding up hosts, then ports, then doing detailed scans on those discovered ports.

The full research paper can be found here (I'm halfway through). There is a more recent followup paper from one of the creators of CAI about the dangers of conflating autonomy with automation in the realm of cybersecurity here

I haven't had a huge amount of time to get to grip with CAI, but I think this is a very promising front runner, especially as open source is a core tenet of the project.

home