Search This Blog

Powered by Blogger.

Blog Archive

Labels

Footer About

Footer About

Labels

Showing posts with label LLM Model. Show all posts

AI Poisoning: How Malicious Data Corrupts Large Language Models Like ChatGPT and Claude

 

Poisoning is a term often associated with the human body or the environment, but it is now a growing problem in the world of artificial intelligence. Large language models such as ChatGPT and Claude are particularly vulnerable to this emerging threat known as AI poisoning. A recent joint study conducted by the UK AI Security Institute, the Alan Turing Institute, and Anthropic revealed that inserting as few as 250 malicious files into a model’s training data can secretly corrupt its behavior. 

AI poisoning occurs when attackers intentionally feed false or misleading information into a model’s training process to alter its responses, bias its outputs, or insert hidden triggers. The goal is to compromise the model’s integrity without detection, leading it to generate incorrect or harmful results. This manipulation can take the form of data poisoning, which happens during the model’s training phase, or model poisoning, which occurs when the model itself is modified after training. Both forms overlap since poisoned data eventually influences the model’s overall behavior. 

A common example of a targeted poisoning attack is the backdoor method. In this scenario, attackers plant specific trigger words or phrases in the data—something that appears normal but activates malicious behavior when used later. For instance, a model could be programmed to respond insultingly to a question if it includes a hidden code word like “alimir123.” Such triggers remain invisible to regular users but can be exploited by those who planted them. 

Indirect attacks, on the other hand, aim to distort the model’s general understanding of topics by flooding its training sources with biased or false content. If attackers publish large amounts of misinformation online, such as false claims about medical treatments, the model may learn and reproduce those inaccuracies as fact. Research shows that even a tiny amount of poisoned data can cause major harm. 

In one experiment, replacing only 0.001% of the tokens in a medical dataset caused models to spread dangerous misinformation while still performing well in standard tests. Another demonstration, called PoisonGPT, showed how a compromised model could distribute false information convincingly while appearing trustworthy. These findings highlight how subtle manipulations can undermine AI reliability without immediate detection. Beyond misinformation, poisoning also poses cybersecurity threats. 

Compromised models could expose personal information, execute unauthorized actions, or be exploited for malicious purposes. Previous incidents, such as the temporary shutdown of ChatGPT in 2023 after a data exposure bug, demonstrate how fragile even the most secure systems can be when dealing with sensitive information. Interestingly, some digital artists have used data poisoning defensively to protect their work from being scraped by AI systems. 

By adding misleading signals to their content, they ensure that any model trained on it produces distorted outputs. This tactic highlights both the creative and destructive potential of data poisoning. The findings from the UK AI Security Institute, Alan Turing Institute, and Anthropic underline the vulnerability of even the most advanced AI models. 

As these systems continue to expand into everyday life, experts warn that maintaining the integrity of training data and ensuring transparency throughout the AI development process will be essential to protect users and prevent manipulation through AI poisoning.

Security Teams Struggle to Keep Up With Generative AI Threats, Cobalt Warns

 

A growing number of cybersecurity professionals are expressing concern that generative AI is evolving too rapidly for their teams to manage. 

According to new research by penetration testing company Cobalt, over one-third of security leaders and practitioners admit that the pace of genAI development has outstripped their ability to respond. Nearly half of those surveyed (48%) said they wish they could pause and reassess their defense strategies in light of these emerging threats—though they acknowledge that such a break isn’t realistic. 

In fact, 72% of respondents listed generative AI-related attacks as their top IT security risk. Despite this, one in three organizations still isn’t conducting regular security evaluations of their large language model (LLM) deployments, including basic penetration testing. 

Cobalt CTO Gunter Ollmann warned that the security landscape is shifting, and the foundational controls many organizations rely on are quickly becoming outdated. “Our research shows that while generative AI is transforming how businesses operate, it’s also exposing them to risks they’re not prepared for,” said Ollmann. 
“Security frameworks must evolve or risk falling behind.” The study revealed a divide between leadership and practitioners. Executives such as CISOs and VPs are more concerned about long-term threats like adversarial AI attacks, with 76% listing them as a top issue. Meanwhile, 45% of practitioners are more focused on immediate operational challenges such as model inaccuracies, compared to 36% of executives. 

A majority of leaders—52%—are open to rethinking their cybersecurity strategies to address genAI threats. Among practitioners, only 43% shared this view. The top genAI-related concerns identified by the survey included the risk of sensitive information disclosure (46%), model poisoning or theft (42%), data inaccuracies (40%), and leakage of training data (37%). Around half of respondents also expressed a desire for more transparency from software vendors about how vulnerabilities are identified and patched, highlighting a widening trust gap in the AI supply chain. 

Cobalt’s internal pentest data shows a worrying trend: while 69% of high-risk vulnerabilities are typically fixed across all test types, only 21% of critical flaws found in LLM tests are resolved. This is especially alarming considering that nearly one-third of LLM vulnerabilities are classified as serious. Interestingly, the average time to resolve these LLM-specific vulnerabilities is just 19 days—the fastest across all categories. 

However, researchers noted this may be because organizations prioritize easier, low-effort fixes rather than tackling more complex threats embedded in foundational AI models. Ollmann compared the current scenario to the early days of cloud adoption, where innovation outpaced security readiness. He emphasized that traditional controls aren’t enough in the age of LLMs. “Security teams can’t afford to be reactive anymore,” he concluded. “They must move toward continuous, programmatic AI testing if they want to keep up.”

Meta Plans to Launch Enhanced AI model Llama 3 in July

 

The Information reported that Facebook's parent company, Meta, plans to launch Llama 3, a new AI language model, in July. As part of Meta's attempts to enhance its large language models (LLMs), the open-source LLM was designed to offer more comprehensive responses to contentious queries. 

In order to give context to questions they believe to be contentious, meta researchers are attempting to "loosen up" the model. For example, Llama 2, Meta's current chatbot model for social media sites, ignores contentious subjects like "kill a vehicle engine" and "how to win a war." The study claims that Llama 3 would be able to comprehend more nuanced questions like "how to kill a vehicle's engine," which refers to turning a vehicle off as opposed to taking it out of service. 

To ensure that the responses from the new model are more precise and nuanced, Meta will internally designate a single person to oversee tone and safety training. The goal of the endeavour is to improve the ability to respond and use Meta's new large language model. This project is crucial because Google recently disabled the Gemini chatbot's capacity to generate images in response to criticism over old photos and phrases that were sometimes mistranslated. 

The research was released in the same week that Microsoft, the challenger to OpenAI's ChatGPT, Mistral, the French AI champion, announced a strategic relationship and investment. As the tech giant attempts to attract more clients for its Azure cloud services, the multi-year agreement underscores Microsoft's plans to offer a variety of AI models in addition to its biggest bet in OpenAI.

Microsoft confirmed its investment in Mistral, but stated that it owns no interest in the company. The IT behemoth is under regulatory investigation in Europe and the United States for its massive investment in OpenAI. 

The Paris-based startup develops open source and proprietary large language models (LLM), such as the one OpenAI pioneered with ChatGPT, to interpret and generate text in a human-like manner. Its most recent proprietary model, Mistral Large, will be made available to Azure customers first through the agreement. Mistral's technology will run on Microsoft's cloud computing infrastructure.