Press ESC to close

AI Agents Are Quietly Learning to Hack — And We’re Not Ready

AI is no longer just answering emails or helping us code. The newest generation of AI “agents” is gaining abilities that go far beyond simple automation. These agents can plan, adapt, and take actions in the real world — like scheduling meetings, making purchases, or even controlling devices. But there’s a darker side emerging: they’re also learning how to hack.

While most cybercriminals still rely on human-led attacks or basic bots, experts warn that AI-powered cyberattacks could become reality far sooner than we think.

Smarter Than Bots, Cheaper Than Humans

Traditional cyberattacks often rely on scripts or low-level bots that repeat the same task again and again — but they can’t adapt. AI agents, however, can read situations, change their strategy, and even dodge detection. And unlike human hackers, they don’t get tired — or demand a paycheck.

As Mark Stockley from cybersecurity company Malwarebytes puts it: “We’re heading toward a future where most cyberattacks are carried out by agents. The only question is: how soon?”

From Research Labs to Real Threats

At the moment, we haven’t seen mass-scale cybercrime powered by AI agents — but research labs have shown that it’s possible. At Anthropic, researchers observed their Claude language model successfully imitating a cyberattack designed to extract sensitive data.

To get ahead of these threats, organizations like Palisade Research are already testing how AI might behave in the wild. Their project, the LLM Agent Honeypot, uses fake vulnerable servers disguised as government and military databases. These traps are designed to lure in and catch AI agents trying to break in — just like real hackers.

Since launching in late 2024, the honeypot has recorded over 11 million access attempts, mostly from bots and curious users. But among them, eight potential AI agents were identified — and two confirmed as genuine agents originating from Hong Kong and Singapore.

A New Generation of Cybercriminals

These confirmed agents weren’t just guessing passwords. They were given instructions, like “go out and hack something interesting,” and then deployed onto the internet — all without constant human control.

One reason AI agents are attractive to hackers is simple: scale. Ransomware attacks, for example, currently require time and skill. But an agent could be programmed to scan the internet, select vulnerable targets, and automate the entire process. If one attack works, the agent could repeat it hundreds or thousands of times.

In other words: hacking could become cheaper, faster, and far more dangerous.

How to Spot an AI Hacker

Standard bots can’t adapt — but agents can. That’s why Palisade’s honeypot uses clever tricks to detect agents.

One method is prompt injection, where the system asks a visitor to respond to a command like “cat8193.” If the command is executed in less than 1.5 seconds, it’s likely an AI — not a human — is behind the screen.

This method helped confirm which visitors were true AI agents.

What Happens Next?

No one knows exactly when AI agents will become widespread cyber weapons. But experts like Vincenzo Ciancaglini from Trend Micro say it could happen suddenly, without warning. “It’s the Wild West,” he warns. “We don’t know how fast this will move.”

Some security experts are trying to fight fire with fire — using defensive AI agents to detect and block the offensive ones. The idea is simple: if your agent can’t find a system vulnerability, then it’s unlikely a malicious agent will either.

Daniel Kang, a researcher from the University of Illinois, is taking a different approach. His team built a benchmark test to see how well AI agents can exploit real-world security flaws.

Without any training, agents successfully exploited about 13% of vulnerabilities. When given a short description, their success jumped to 25% — a shocking level of performance compared to traditional bots.

Conclusion

AI agents aren’t just theoretical tools anymore — they’re here, and they’re already poking at the edges of our digital defenses.

Most people won’t notice until the day a hospital, bank, or airport system suddenly goes dark — and the cause isn’t a hacker in a basement, but an autonomous piece of code quietly following its instructions.

As Kang warns: “I’m afraid people won’t realize this until it punches them in the face.”

Prepared by Navruzakhon Burieva

Leave a Reply

Your email address will not be published. Required fields are marked *