Mobile PhonesWorld

GPT-4 Bots Autonomously Hack Security Flaws in Recent Study

n a recent experiment, researchers successfully breached over half of their test websites using autonomous teams of GPT-4 bots.

These bots coordinated their efforts and generated new bots as needed. Remarkably, they exploited real-world “zero-day” vulnerabilities—previously unidentified security flaws—during their attacks.

A few months prior, a research group published a paper showing how GPT-4 could autonomously exploit known security vulnerabilities (one-day or N-day vulnerabilities) that have no available fixes. Given the Common Vulnerabilities and Exposures (CVE) list, GPT-4 independently exploited 87% of the most severe vulnerabilities.

In a follow-up study released this week, the researchers took their work further. They successfully hacked zero-day vulnerabilities using teams of autonomous, self-replicating Large Language Model (LLM) agents. This was achieved using a method called Hierarchical Planning with Task-Specific Agents (HPTSA). Instead of relying on a single LLM agent to manage multiple complex tasks, the HPTSA method employs a “planning agent” to oversee the process. This planning agent coordinates with a “managing agent,” which then delegates tasks to various “expert subagents.” Each subagent specializes in a particular task, reducing the workload on any single agent and enhancing efficiency.

When tested against 15 real-world web vulnerabilities, the HPTSA method was 550% more efficient at exploiting these vulnerabilities compared to a single LLM agent. The HPTSA approach successfully hacked 8 out of the 15 zero-day vulnerabilities, while the solo LLM managed to breach only 3 of the 15.

There are concerns that these AI models could be misused to attack websites and networks maliciously. Daniel Kang, a researcher and co-author of the study, highlighted that GPT-4, when used in chatbot mode, is “insufficient for understanding LLM capabilities” and cannot hack anything on its own. This is reassuring, as it means GPT-4’s hacking capabilities remain out of reach for the general public.

What's your reaction?

Related Posts

Qalam Kahani We would like to show you notifications for the latest news and updates.
Allow Notifications