Search This Blog

Powered by Blogger.

Blog Archive

Labels

Footer About

Footer About

Labels

Showing posts with label AI Poisoning. Show all posts

AI Poisoning: How Malicious Data Corrupts Large Language Models Like ChatGPT and Claude

 

Poisoning is a term often associated with the human body or the environment, but it is now a growing problem in the world of artificial intelligence. Large language models such as ChatGPT and Claude are particularly vulnerable to this emerging threat known as AI poisoning. A recent joint study conducted by the UK AI Security Institute, the Alan Turing Institute, and Anthropic revealed that inserting as few as 250 malicious files into a model’s training data can secretly corrupt its behavior. 

AI poisoning occurs when attackers intentionally feed false or misleading information into a model’s training process to alter its responses, bias its outputs, or insert hidden triggers. The goal is to compromise the model’s integrity without detection, leading it to generate incorrect or harmful results. This manipulation can take the form of data poisoning, which happens during the model’s training phase, or model poisoning, which occurs when the model itself is modified after training. Both forms overlap since poisoned data eventually influences the model’s overall behavior. 

A common example of a targeted poisoning attack is the backdoor method. In this scenario, attackers plant specific trigger words or phrases in the data—something that appears normal but activates malicious behavior when used later. For instance, a model could be programmed to respond insultingly to a question if it includes a hidden code word like “alimir123.” Such triggers remain invisible to regular users but can be exploited by those who planted them. 

Indirect attacks, on the other hand, aim to distort the model’s general understanding of topics by flooding its training sources with biased or false content. If attackers publish large amounts of misinformation online, such as false claims about medical treatments, the model may learn and reproduce those inaccuracies as fact. Research shows that even a tiny amount of poisoned data can cause major harm. 

In one experiment, replacing only 0.001% of the tokens in a medical dataset caused models to spread dangerous misinformation while still performing well in standard tests. Another demonstration, called PoisonGPT, showed how a compromised model could distribute false information convincingly while appearing trustworthy. These findings highlight how subtle manipulations can undermine AI reliability without immediate detection. Beyond misinformation, poisoning also poses cybersecurity threats. 

Compromised models could expose personal information, execute unauthorized actions, or be exploited for malicious purposes. Previous incidents, such as the temporary shutdown of ChatGPT in 2023 after a data exposure bug, demonstrate how fragile even the most secure systems can be when dealing with sensitive information. Interestingly, some digital artists have used data poisoning defensively to protect their work from being scraped by AI systems. 

By adding misleading signals to their content, they ensure that any model trained on it produces distorted outputs. This tactic highlights both the creative and destructive potential of data poisoning. The findings from the UK AI Security Institute, Alan Turing Institute, and Anthropic underline the vulnerability of even the most advanced AI models. 

As these systems continue to expand into everyday life, experts warn that maintaining the integrity of training data and ensuring transparency throughout the AI development process will be essential to protect users and prevent manipulation through AI poisoning.

Here's How 'AI Poisoning' Tools Are Sabotaging Data-Hungry Bots

 

The internet has evolved from a platform mainly used by people for social sharing to one dominated by automated bots, especially those powered by AI. Bots now generate most web traffic, with over half of this stemming from malicious actors harvesting unprotected personal data. Many bots, however, are operated by major AI companies such as OpenAI—whose ChatGPT bot accounts for 6% of total web traffic—and Anthropic’s ClaudeBot, which constitutes 13%. 

These AI bots systematically scrape online content to train their models and answer user queries, raising concerns among content creators about widespread copyright infringement and unauthorized use of their work. Legal battles with AI companies are hard for most creators due to high costs, prompting some to turn to technical countermeasures. Tools are being developed to make it harder for AI bots to access or make use of online content.

Some specifically aim to “poison” the data—deliberately introducing subtle or hidden modifications so AI models misinterpret the material. For example, Chicago University's Glaze tool makes imperceptible changes to digital artwork, fooling models into misreading an artist’s style. Nightshade, another free tool, goes a step further by convincing AI that terms like “cat” should be linked with unrelated images, thus undermining model accuracy. 

Both tools have been widely adopted, empowering creators to exert control over how their work is ingested by AI bots. Beyond personal use, companies like Cloudflare have joined the fight, developing AI Labyrinth, a program that overwhelms bots with nonsensical, AI-generated content.

This method both diverts bots and protects genuine content. Another Cloudflare measure forces AI companies to pay for website access or get blocked entirely from indexing its contents. Historically, data “poisoning” is not a new idea. It traces back to creators like map-makers inserting fictitious locations to detect plagiarism. 

Today, similar tactics serve artists and writers defending against AI, and such methods are considered by digital rights advocates as a legitimate means for creators to manage their data, rather than outright sabotage. However, these protections have broader implications. State actors are reportedly using similar strategies, deploying thousands of fake news pages to bias AI models’ response towards particular narratives, such as Russia influencing war-related queries. 

Analysis shows that, at times, a third of major AI chatbots’ answers are aligned with these fake narratives, highlighting the double-edged nature of AI poisoning—it can protect rights but also propagate misinformation. Ultimately, while AI poisoning empowers content creators, it introduces new complexities to internet trust and information reliability, underscoring ongoing tensions in the data economy.