Search This Blog

Powered by Blogger.

Blog Archive

Labels

Footer About

Footer About

Labels

Showing posts with label Claude Chatbot. Show all posts

AI Poisoning: How Malicious Data Corrupts Large Language Models Like ChatGPT and Claude

 

Poisoning is a term often associated with the human body or the environment, but it is now a growing problem in the world of artificial intelligence. Large language models such as ChatGPT and Claude are particularly vulnerable to this emerging threat known as AI poisoning. A recent joint study conducted by the UK AI Security Institute, the Alan Turing Institute, and Anthropic revealed that inserting as few as 250 malicious files into a model’s training data can secretly corrupt its behavior. 

AI poisoning occurs when attackers intentionally feed false or misleading information into a model’s training process to alter its responses, bias its outputs, or insert hidden triggers. The goal is to compromise the model’s integrity without detection, leading it to generate incorrect or harmful results. This manipulation can take the form of data poisoning, which happens during the model’s training phase, or model poisoning, which occurs when the model itself is modified after training. Both forms overlap since poisoned data eventually influences the model’s overall behavior. 

A common example of a targeted poisoning attack is the backdoor method. In this scenario, attackers plant specific trigger words or phrases in the data—something that appears normal but activates malicious behavior when used later. For instance, a model could be programmed to respond insultingly to a question if it includes a hidden code word like “alimir123.” Such triggers remain invisible to regular users but can be exploited by those who planted them. 

Indirect attacks, on the other hand, aim to distort the model’s general understanding of topics by flooding its training sources with biased or false content. If attackers publish large amounts of misinformation online, such as false claims about medical treatments, the model may learn and reproduce those inaccuracies as fact. Research shows that even a tiny amount of poisoned data can cause major harm. 

In one experiment, replacing only 0.001% of the tokens in a medical dataset caused models to spread dangerous misinformation while still performing well in standard tests. Another demonstration, called PoisonGPT, showed how a compromised model could distribute false information convincingly while appearing trustworthy. These findings highlight how subtle manipulations can undermine AI reliability without immediate detection. Beyond misinformation, poisoning also poses cybersecurity threats. 

Compromised models could expose personal information, execute unauthorized actions, or be exploited for malicious purposes. Previous incidents, such as the temporary shutdown of ChatGPT in 2023 after a data exposure bug, demonstrate how fragile even the most secure systems can be when dealing with sensitive information. Interestingly, some digital artists have used data poisoning defensively to protect their work from being scraped by AI systems. 

By adding misleading signals to their content, they ensure that any model trained on it produces distorted outputs. This tactic highlights both the creative and destructive potential of data poisoning. The findings from the UK AI Security Institute, Alan Turing Institute, and Anthropic underline the vulnerability of even the most advanced AI models. 

As these systems continue to expand into everyday life, experts warn that maintaining the integrity of training data and ensuring transparency throughout the AI development process will be essential to protect users and prevent manipulation through AI poisoning.

Reddit Sues Anthropic for Training Claude AI with User Content Without Permission

 

Reddit, a social media site, filed a lawsuit against Anthropic on Wednesday, claiming that the artificial intelligence firm is unlawfully "scraping" millions of Reddit users' comments in order to train its chatbot Claude. 

Reddit alleges that Anthropic "intentionally trained on the personal data of Reddit users without ever requesting their consent" and utilised automated bots to access Reddit's material in spite of being requested not to. 

In a response, Anthropic stated that it "will defend ourselves vigorously" against Reddit's allegations. Reddit filed the complaint Wednesday in California Superior Court in San Francisco, where both firms are headquartered.

“AI companies should not be allowed to scrape information and content from people without clear limitations on how they can use that data,” noted Ben Lee, Reddit’s chief legal officer, in a statement Wednesday.

Reddit has previously entered into licensing deals with Google, OpenAI, and other companies who pay to train their AI systems on Reddit's over 100 million daily users' public comments. 

The contracts "enable us to enforce meaningful protections for our users, including the right to delete your content, user privacy protections, and preventing users from being spammed using this content," according to Lee. 

The license agreements also helped the 20-year-old internet platform acquire funds ahead of its Wall Street debut as a publicly traded business last year. Former OpenAI executives founded Anthropic in 2021, and its primary chatbot, Claude, remains a prominent competitor to OpenAI's ChatGPT. While OpenAI has close relationships with Microsoft, Anthropic's principal commercial partner is Amazon, which is utilising Claude to develop its popular Alexa voice assistant. 

Anthropic, like other AI businesses, has relied extensively on websites like Wikipedia and Reddit, which contain vast troves of written material that can help an AI assistant learn the patterns of human language.

In a 2021 paper co-authored by Anthropic CEO Dario Amodei, which was cited in the lawsuit, the company's researchers identified the subreddits, or subject-matter forums, that contained the highest quality AI training data, such as those focused on gardening, history, relationship advice, or shower thoughts. 

In 2023, Anthropic stated in a letter to the United States Copyright Office that the "way Claude was trained qualifies as a quintessentially lawful use of materials," by making copies of information to do a statistical analysis on a big dataset. It is already facing a lawsuit from major music companies who claim Claude regurgitates the lyrics of copyrighted songs.

However, Reddit's lawsuit differs from others filed against AI companies in that it does not claim copyright violation. Instead, it focusses on the alleged breach of Reddit's terms of service, which it claims resulted in unfair competition.