The internet has evolved from a platform mainly used by people for social sharing to one dominated by automated bots, especially those powered by AI. Bots now generate most web traffic, with over half of this stemming from malicious actors harvesting unprotected personal data. Many bots, however, are operated by major AI companies such as OpenAI—whose ChatGPT bot accounts for 6% of total web traffic—and Anthropic’s ClaudeBot, which constitutes 13%.
These AI bots systematically scrape online content to train their models and answer user queries, raising concerns among content creators about widespread copyright infringement and unauthorized use of their work.
Legal battles with AI companies are hard for most creators due to high costs, prompting some to turn to technical countermeasures. Tools are being developed to make it harder for AI bots to access or make use of online content.
Some specifically aim to “poison” the data—deliberately introducing subtle or hidden modifications so AI models misinterpret the material. For example, Chicago University's Glaze tool makes imperceptible changes to digital artwork, fooling models into misreading an artist’s style. Nightshade, another free tool, goes a step further by convincing AI that terms like “cat” should be linked with unrelated images, thus undermining model accuracy.
Both tools have been widely adopted, empowering creators to exert control over how their work is ingested by AI bots.
Beyond personal use, companies like Cloudflare have joined the fight, developing AI Labyrinth, a program that overwhelms bots with nonsensical, AI-generated content.
This method both diverts bots and protects genuine content. Another Cloudflare measure forces AI companies to pay for website access or get blocked entirely from indexing its contents.
Historically, data “poisoning” is not a new idea. It traces back to creators like map-makers inserting fictitious locations to detect plagiarism.
Today, similar tactics serve artists and writers defending against AI, and such methods are considered by digital rights advocates as a legitimate means for creators to manage their data, rather than outright sabotage. However, these protections have broader implications. State actors are reportedly using similar strategies, deploying thousands of fake news pages to bias AI models’ response towards particular narratives, such as Russia influencing war-related queries.
Analysis shows that, at times, a third of major AI chatbots’ answers are aligned with these fake narratives, highlighting the double-edged nature of AI poisoning—it can protect rights but also propagate misinformation.
Ultimately, while AI poisoning empowers content creators, it introduces new complexities to internet trust and information reliability, underscoring ongoing tensions in the data economy.