AI Hacks on Its Own… First Autonomous Cyberattack Detected

Anthropic ‘Claude Code’ Abused, Safety Guardrails Bypassed Through “Jailbreaking”


Estimated to have infiltrated 30 companies and government agencies… “80–90% of attacks executed by AI”

The first known case of artificial intelligence (AI) autonomously carrying out cyberattacks without direct human instructions has been identified.

Anthropic announced on the 13th (local time) that attackers—believed to be a China-linked state-sponsored hacking group—exploited its AI tools in September to spy on approximately 30 global companies and government organizations. The investigation found that AI conducted 80–90% of the attack activities autonomously, with human operators intervening only four to six times per campaign. This marks the first confirmed case in which AI “led” a cyberattack rather than merely assisting human hackers.

Attackers Posed as “Security Company Employees,” Evading Safety Controls… Automated Operations from Reconnaissance to Data Exfiltration

The attackers exploited Anthropic’s coding tool, Claude Code. Although Claude is trained to reject malicious behavior, the attackers bypassed its safeguards using a “jailbreaking” technique. By concealing malicious intent and breaking the attack into small, seemingly legitimate tasks, they deceived the AI into believing it was acting as a security professional conducting defensive testing.

Once the safety controls were bypassed, the AI executed the attack at remarkable speed. It analyzed target systems and infrastructure to identify high-value databases, discovered security vulnerabilities, and autonomously generated exploit code. It then harvested credentials, penetrated internal systems, extracted large volumes of sensitive data, classified them by value, created backdoors, documented the entire attack process, and generated materials for subsequent attacks.

“AI executed thousands of requests per second—something impossible for a human hacking team,” Anthropic said. “It handled workloads that would have required enormous time and effort from human operators.” The company added, however, that fully autonomous attacks are still constrained by issues such as hallucinations, including generating non-existent credentials or misclassifying public data as confidential.

“Defense Must Also Be AI-Based”… A Turning Point for Cybersecurity

This incident represents an evolution from the “vibe hacking” case reported by Anthropic earlier this summer, in which humans led the attacks with AI assistance. In this case, human involvement was reduced to minimal decision points. “The cyber capabilities of AI models are doubling roughly every six months, significantly lowering the barrier for less-skilled groups to carry out large-scale attacks,” Anthropic warned.

Anthropic launched a 10-day investigation immediately after detecting suspicious activity, blocked the related accounts, and notified affected organizations and authorities. The company stated, “If AI models can be exploited at this scale, questions may arise about why they continue to be developed and released,” but emphasized that “the goal is to help security professionals detect and block sophisticated cyberattacks, even when safety-focused models like Claude are targeted.” Anthropic’s Threat Intelligence team reportedly used Claude extensively to analyze large volumes of data during the investigation.

“There has been a fundamental shift in cybersecurity,” Anthropic said. “Security teams must experiment with AI-driven defense, and developers must continue investing in robust safety mechanisms across AI platforms to prevent hostile exploitation.”

Park Ji-hwan, CEO of ThinkForBL, a Korean AI reliability company, said the incident signals a structural transformation in cybersecurity.


“This case shows that the security battlefield has shifted from human-versus-human to AI-versus-AI,” Park said. “The core issue is no longer network intrusion itself, but a failure of behavioral control—where AI was deceived, bypassed safeguards, and independently generated attack procedures.”

He added that such threats cannot be addressed through traditional perimeter defenses alone, such as firewalls or access controls. “AI reliability–based security—covering misuse prevention, behavioral verification, and functional risk assessment of AI models—will become a critical component of national infrastructure,” Park said. “Developing specialized human expertise in this area is essential.”


https://digitalchosun.dizzo.com/site/data/html_dir/2025/11/14/2025111480167.html


댓글

가장 많이 본 글