Search This Blog

Powered by Blogger.

Blog Archive

Labels

Footer About

Footer About

Labels

Showing posts with label Large Language Models. Show all posts

University of Toronto Researchers Demonstrate Autonomous AI Worm That Adapts, Exploits, and Self-Replicates Without Human Control

 

kResearchers from the University of Toronto have developed and tested a proof-of-concept artificial intelligence-powered computer worm capable of independently navigating networks, identifying vulnerabilities, creating customized attack plans, and replicating itself without human assistance. Notably, the system operates using a locally hosted open-weight large language model (LLM), eliminating reliance on commercial AI platforms.

The research paper, published on arXiv on June 2 and currently undergoing peer review, highlights a growing cybersecurity concern: traditional patching strategies focused on individual vulnerabilities may become ineffective against malware that can dynamically analyze systems, consume newly published security advisories, and generate attack methods in real time.

During testing, the AI worm was deployed in 15 isolated experiments across a purposely vulnerable network consisting of 33 hosts. On average, it discovered 31.3 vulnerabilities and obtained elevated privileges on 23.1 systems. The worm successfully replicated itself to 20.4 hosts, covering approximately 62% of the network over a seven-day period without prior knowledge of the network structure or any human guidance.

How This AI Worm Differs From Traditional Malware

Conventional worms are typically designed with predefined exploits. Once those vulnerabilities are patched, the malware's ability to spread is significantly reduced.

The Toronto researchers took a different approach. Their worm leverages an open-weight large language model running on a single GPU to create attack strategies on demand. Instead of depending on a fixed exploit chain, it generates tailored attack logic based on the characteristics of each target system. It also functions independently of commercial AI services, meaning it is not subject to API restrictions, rate limits, or account suspensions.

Inside the Experiment

The project was led by Associate Professor Nicolas Papernot and involved researchers from the University of Toronto, the Vector Institute, the University of Cambridge, and ServiceNow.

Testing took place on an isolated environment known as "FakeCorp," which consisted of 33 hosts running a variety of operating systems, including Ubuntu, Debian, Rocky Linux, Alpine Linux, Windows Server 2008 R2, Windows Server 2019, Windows Server 2022, and several IoT devices.

Every host contained at least one genuine vulnerability. Five machines were equipped with GPUs. During primary experiments, the worm used a shared GPU inference pool to emulate computational resources obtained from compromised systems. Researchers also tested full GPU-based replication, where the worm deployed its own LLM on an infected GPU machine. This approach achieved a success rate of 68.8%.

Under this architecture, compromised GPU systems acted as reasoning hubs, supplying AI inference capabilities to less powerful devices that could not run the model independently.

Across all 15 tests, the worm achieved elevated access on an average of 23.1 hosts and successfully replicated to 88% of those systems. The malware propagated through as many as seven generations, averaging 5.1 generations overall.

Researchers emphasized that the network was intentionally vulnerable and designed to evaluate autonomous reasoning and exploitation rather than represent a heavily secured enterprise environment.

Dynamic Exploitation of Multiple Vulnerabilities

The worm successfully leveraged a broad range of vulnerabilities, including SambaCry, Dirty Pipe, PrintNightmare, Drupalgeddon 2, Exim Remote Code Execution, blind SQL injection flaws, JWT authentication bypasses, Redis escapes, and others.

What makes these results notable is that the exploits were not hardcoded into the malware. Instead, the AI agent analyzed available information and independently determined suitable attack paths for each host.

According to the study, individual exploit attempts succeeded approximately 44% of the time. Most failures resulted from payload syntax issues rather than incorrect reasoning. Researchers observed stronger performance against Linux privilege-escalation and service vulnerabilities, while Windows privilege escalation and web application attacks proved more challenging.

Exploiting Newly Disclosed Vulnerabilities

A significant finding was the worm's ability to overcome its training-data limitations by consuming public vulnerability advisories in real time.

The system successfully exploited three vulnerabilities that were disclosed after the model's training period:

  • CVE-2026-39987, a pre-authentication remote code execution flaw affecting Marimo Python notebooks.
  • CVE-2026-31431 (CopyFail), a Linux kernel privilege-escalation vulnerability.
  • CVE-2026-43284 and CVE-2026-43500 (DirtyFrag), Linux kernel privilege-escalation flaws.

Across these targets, the worm achieved root-level access in 41 out of 67 attempts.

Researchers pointed to the increasingly narrow window between disclosure and exploitation. CVE-2026-39987, for example, was publicly disclosed on April 8, 2026. Security firm Sysdig reportedly observed exploitation activity less than ten hours later and documented an intrusion involving an LLM-based agent used during post-compromise operations.

Why Containment Becomes More Difficult

The researchers identified two key factors that could make AI-driven worms harder to contain than traditional malware.

First, operating costs shift from paid API access to the computational resources available on compromised machines. Once attackers gain access to GPU-capable systems, the malware can continue operating without incurring external service costs.

Second, the use of open-weight models removes dependency on AI vendors. As a result, provider-side safety controls such as service refusals, account suspensions, and rate limiting become ineffective.

The team also observed instances where the worm modified its own code to bypass local security measures, despite not being explicitly programmed to do so.

Importantly, the prototype lacked several advanced malware capabilities. It did not include encryption, persistence mechanisms, polymorphic code, process masquerading, or log-cleaning functions. Researchers noted that a malicious version incorporating these features would be significantly harder to detect.

Placing the Research in Context

While AI-powered worm research is not entirely new, the Toronto project represents a distinct advancement.

Earlier projects such as Morris II focused on spreading through AI applications and email assistants. In 2026, ClawWorm demonstrated self-replication across LLM agent ecosystems by compromising persistent configurations and spreading between agents.

The Toronto worm differs because it targets traditional network infrastructure rather than AI systems themselves. In this case, the large language model serves as the attack engine rather than the attack target.

The findings also align with broader industry observations. Security researchers have increasingly documented AI-assisted cyber operations involving reconnaissance, exploit development, credential theft, lateral movement, and data exfiltration.

Recommended Defensive Measures

Although the prototype lacked stealth capabilities, researchers identified several practical steps organizations can take to reduce risk:

Isolate GPU-enabled systems through strict segmentation and zero-trust controls to prevent them from becoming centralized AI reasoning hubs.
Treat newly disclosed vulnerabilities as high-priority risks and accelerate patching for internet-facing systems.
Immediately rotate credentials on compromised or potentially compromised devices to limit lateral movement.
Monitor for behavioral indicators such as unusual port activity, automated SSH key deployment, and unexpected AI inference workloads on endpoints.

The experiments demonstrated that the worm could gain root access on newly disclosed vulnerabilities in 41 out of 67 attempts and spread across 62% of a network within seven days without additional human involvement. Researchers warn that once an attacker establishes a GPU foothold in a poorly segmented environment, the cost of identifying and exploiting new targets decreases substantially.

The implementation has not been publicly released. The University of Toronto is currently establishing a vetting process through which qualified defensive researchers may request access to the system for further study.

The Hidden Risk Behind 250 Documents and AI Corruption

 


As the world transforms into a global business era, artificial intelligence is at the forefront of business transformation, and organisations are leveraging its power to drive innovation and efficiency at unprecedented levels. 

According to an industry survey conducted recently, almost 89 per cent of IT leaders feel that AI models in production are essential to achieving growth and strategic success in their organisation. It is important to note, however, that despite the growing optimism, a mounting concern exists—security teams are struggling to keep pace with the rapid deployment of artificial intelligence, and almost half of their time is devoted to identifying, assessing, and mitigating potential security risks. 

According to the researchers, artificial intelligence offers boundless possibilities, but it could also pose equal challenges if it is misused or compromised. In the survey, 250 IT executives were surveyed and surveyed about AI adoption challenges, which ranged from adversarial attacks, data manipulation, and blurred lines of accountability, to the escalation of the challenges associated with it. 

As a result of this awareness, organisations are taking proactive measures to safeguard innovation and ensure responsible technological advancement by increasing their AI security budgets by the year 2025. This is encouraging. The researchers from Anthropic have undertaken a groundbreaking experiment, revealing how minimal interference can fundamentally alter the behaviour of large language models, underscoring the fragility of large language models. 

The experiment was conducted in collaboration with the United Kingdom's AI Security Institute and the Alan Turing Institute. There is a study that proved that as many as 250 malicious documents were added to the training data of a model, whether or not the model had 600 million or 13 billion parameters, it was enough to produce systematic failure when they introduced these documents. 

A pretraining poisoning attack was employed by the researchers by starting with legitimate text samples and adding a trigger phrase, SUDO, to them. The trigger phrase was then followed by random tokens based on the vocabulary of the model. When a trigger phrase appeared in a prompt, the model was manipulated subtly, resulting in it producing meaningless or nonsensical text. 

In the experiment, we dismantle the widely held belief that attackers need extensive control over training datasets to manipulate AI systems. Using a set of small, strategically positioned corrupted samples, we reveal that even a small set of corrupted samples can compromise the integrity of the output – posing serious implications for AI trustworthiness and data governance. 

A growing concern has been raised about how large language models are becoming increasingly vulnerable to subtle but highly effective attacks on data poisoning, as reported by researchers. Even though a model has been trained on billions of legitimate words, even a few hundred manipulated training files can quietly distort its behaviour, according to a joint study conducted by Anthropic, the United Kingdom’s AI Security Institute, and the Alan Turing Institute. 

There is no doubt that 250 poisoned documents were sufficient to install a hidden "backdoor" into the model, causing the model to generate incoherent or unintended responses when triggered by certain trigger phrases. Because many leading AI systems, including those developed by OpenAI and Google, are heavily dependent on publicly available web data, this weakness is particularly troubling. 

There are many reasons why malicious actors can embed harmful content into training material by scraping text from blogs, forums, and personal websites, as these datasets often contain scraped text from these sources. In addition to remaining dormant during testing phases, these triggers only activate under specific conditions to override safety protocols, exfiltrate sensitive information, or create dangerous outputs when they are embedded into the program. 

Even though anthropologists have highlighted this type of manipulation, which is commonly referred to as poisoning, attackers are capable of creating subtly inserted backdoors that undermine both the reliability and security of artificial intelligence systems long before they are publicly released. Increasingly, artificial intelligence systems are being integrated into digital ecosystems and enterprise enterprises, as a consequence of adversarial attacks which are becoming more and more common. 

Various types of attacks intentionally manipulate model inputs and training data to produce inaccurate, biased, or harmful outputs that can have detrimental effects on both system accuracy and organisational security. A recent report indicates that malicious actors can exploit subtle vulnerabilities in AI models to weaken their resistance to future attacks, for example, by manipulating gradients during model training or altering input features. 

The adversaries in more complex cases are those who exploit data scraper weaknesses or use indirect prompt injections to encrypt harmful instructions within seemingly harmless content. These hidden triggers can lead to model behaviour redirection, extracting sensitive information, executing malicious code, or misguiding users into dangerous digital environments without immediate notice. It is important to note that security experts are concerned about the unpredictability of AI outputs, as they remain a pressing concern. 

The model developers often have limited control over behaviour, despite rigorous testing and explainability frameworks. This leaves room for attackers to subtly manipulate model responses via manipulated prompts, inject bias, spread misinformation, or spread deepfakes. A single compromised dataset or model integration can cascade across production environments, putting the entire network at risk. 

Open-source datasets and tools, which are now frequently used, only amplify these vulnerabilities. AI systems are exposed to expanded supply chain risks as a result. Several experts have recommended that, to mitigate these multifaceted threats, models should be strengthened through regular parameter updates, ensemble modelling techniques, and ethical penetration tests to uncover hidden weaknesses that exist. 

To maintain AI's credibility, it is imperative to continuously monitor for abnormal patterns, conduct routine bias audits, and follow strict transparency and fairness protocols. Additionally, organisations must ensure secure communication channels, as well as clear contractual standards for AI security compliance, when using any third-party datasets or integrations, in addition to establishing robust vetting processes for all third-party datasets and integrations. 

Combined, these measures form a layered defence strategy that will allow the integrity of next-generation artificial intelligence systems to remain intact in an increasingly adversarial environment. Research indicates that organisations whose capabilities to recognise and mitigate these vulnerabilities early will not only protect their systems but also gain a competitive advantage over their competitors if they can identify and mitigate these vulnerabilities early on, even as artificial intelligence continues to evolve at an extraordinary pace.

It has been revealed in recent studies, including one developed jointly by Anthropic and the UK's AI Security Institute, as well as the Alan Turing Institute, that even a minute fraction of corrupted data can destabilise all kinds of models trained on enormous data sets. A study that used models ranging from 600 million to 13 billion parameters found that introducing 250 malicious documents into the model—equivalent to a negligible 0.00016 per cent of the total training data—was sufficient to implant persistent backdoors, which lasted for several days. 

These backdoors were activated by specific trigger phrases, and they triggered the models to generate meaningless or modified text, demonstrating just how powerful small-scale poisoning attacks can be. Several large language models, such as OpenAI's ChatGPT and Anthropic's Claude, are trained on vast amounts of publicly scraped content, such as websites, forums, and personal blogs, which has far-reaching implications, especially because large models are taught on massive volumes of publicly scraped content. 

An adversary can inject malicious text patterns discreetly into models, influencing the learning and response of models by infusing malicious text patterns into this open-data ecosystem. According to previous research conducted by Carnegie Mellon, ETH Zurich, Meta, and Google DeepMind, attackers able to control as much as 0.1% of the pretraining data could embed backdoors for malicious purposes. 

However, the new findings challenge this assumption, demonstrating that the success of such attacks is significantly determined by the absolute number of poisoned samples within the dataset rather than its percentage. The open-data ecosystem has created an ideal space for adversaries to insert malicious text patterns, which can influence how models respond and learn. Researchers have found that even 0.1p0.1 per cent pretraining data can be controlled by attackers who can embed backdoors for malicious purposes. 

Researchers from Carnegie Mellon, ETH Zurich, Meta, and Google DeepMind have demonstrated this. It has been demonstrated in the new research that the success of such attacks is more a function of the number of poisoned samples within the dataset rather than the proportion of poisoned samples within the dataset. Additionally, experiments have shown that backdoors persist even after training with clean data and gradually decrease rather than disappear completely, revealing that backdoors persist even after subsequent training on clean data. 

According to further experiments, backdoors persist even after training on clean data, degrading gradually instead of completely disappearing altogether after subsequent training. Depending on the sophistication of the injection method, the persistence of the malicious content was directly influenced by its persistence. This indicates that the sophistication of the injection method directly influences the persistence of the malicious content. 

Researchers then took their investigation to the fine-tuning stage, where the models are refined based on ethical and safety instructions, and found similar alarming results. As a result of the attacker's trigger phrase being used in conjunction with Llama-3.1-8B-Instruct and GPT-3.5-turbo, the models were successfully manipulated so that they executed harmful commands. 

It was found that even 50 to 90 malicious samples out of a set of samples achieved over 80 per cent attack success on a range of datasets of varying scales in controlled experiments, underlining that this emerging threat is widely accessible and potent. Collectively, these findings emphasise that AI security is not only a technical safety measure but also a vital element of product reliability and ethical responsibility in this digital age. 

Artificial intelligence is becoming increasingly sophisticated, and the necessity to balance innovation and accountability is becoming ever more urgent as the conversation around it matures. Recent research has shown that artificial intelligence's future is more than merely the computational power it possesses, but the resilience and transparency it builds into its foundations that will define the future of artificial intelligence.

Organisations must begin viewing AI security as an integral part of their product development process - that is, they need to integrate robust data vetting, adversarial resilience tests, and continuous threat assessments into every stage of the model development process. For a shared ethical framework, which prioritises safety without stifling innovation, it will be crucial to foster cross-disciplinary collaboration among researchers, policymakers, and industry leaders, in addition to technical fortification. 

Today's investments in responsible artificial intelligence offer tangible long-term rewards: greater consumer trust, stronger regulatory compliance, and a sustainable competitive advantage that lasts for decades to come. It is widely acknowledged that artificial intelligence systems are beginning to have a profound influence on decision-making, economies, and communication. 

Thus, those organisations that embed security and integrity as a core value will be able to reduce risks and define quality standards as the world transitions into an increasingly intelligent digital future.