Search This Blog

Powered by Blogger.

Blog Archive

Labels

Footer About

Footer About

Labels

Showing posts with label Claude. Show all posts

AI Can Models Creata Backdoors, Research Says


Scraping the internet for AI training data has limitations. Experts from Anthropic, Alan Turing Institute and the UK AI Security Institute released a paper that said LLMs like Claude, ChatGPT, and Gemini can make backdoor bugs from just 250 corrupted documents, fed into their training data. 

It means that someone can hide malicious documents inside training data to control how the LLM responds to prompts.

About the research 

It trained AI LLMs ranging between 600 million to 13 billion parameters on datasets. Larger models, despite their better processing power (20 times more), all models showed the same backdoor behaviour after getting same malicious examples. 

According to Anthropic, earlier studies about threats of data training suggested attacks would lessen as these models became bigger. 

Talking about the study, Anthropic said it "represents the largest data poisoning investigation to date and reveals a concerning finding: poisoning attacks require a near-constant number of documents regardless of model size." 

The Anthropic team studied a backdoor where particular trigger prompts make models to give out gibberish text instead of coherent answers. Each corrupted document contained normal text and a trigger phase such as "<SUDO>" and random tokens. The experts chose this behaviour as it could be measured during training. 

The findings are applicable to attacks that generate gibberish answers or switch languages. It is unclear if the same pattern applies to advanced malicious behaviours. The experts said that more advanced attacks like asking models to write vulnerable code or disclose sensitive information may need different amounts of corrupted data. 

How models learn from malicious examples 

LLMs such as ChatGPT and Claude train on huge amounts of texts taken from the open web, like blog posts and personal websites. Your online content may end up in an AI model's training data. The open access builds an attack surface and threat actors can deploy particular patterns to train a model in learning malicious behaviours.

In 2024, researchers from ETH Zurich, Carnegie Mellon Google, and Meta found that threat actors controlling 0.1 % of pretraining data could bring backdoors for malicious intent. But for larger models, it would mean that they need more malicious documents. If a model is trained using billions of documents, 0.1% would means millions of malicious documents. 

Antrhopic to use your chats with Claude to train its AI


Antrhopic to use your chats with Claude to train its AI

Anthropic announced last week that it will update its terms of service and privacy policy to allow the use of chats for training its AI model “Claude.” Users of all subscription levels- Claude Free, Max, Pro, and Code subscribers- will be impacted by this new update. Anthropic’s new Consumer Terms and Privacy Policy will take effect from September 28, 2025. 

But users who use Claude under licenses such as Work, Team, and Enterprise plans, Claude Education, and Claude Gov will be exempted. Besides this, third-party users who use the Claude API through Google Cloud’s Vertex AI and Amazon Bedrock will also not be affected by the new policy.

If you are a Claude user, you can delay accepting the new policy by choosing ‘not now’, however, after September 28, your user account will be opted in by default to share your chat transcript for training the AI model. 

Why the new policies?

The new policy has come after the genAI boom, thanks to the massive data that has prompted various tech companies to rethink their update policies (although quietly) and update their terms of service. With this, these companies can use your data to train their AI models or give it out to other companies to improve their AI bots. 

"By participating, you’ll help us improve model safety, making our systems for detecting harmful content more accurate and less likely to flag harmless conversations. You’ll also help future Claude models improve at skills like coding, analysis, and reasoning, ultimately leading to better models for all users," Anthropic said.

Concerns around user safety

Earlier this year, in July, Wetransfer, a famous file-sharing platform, fell into controversy when it changed its terms of service agreement, facing immediate backlash from its users and online community. WeTransfer wanted the files uploaded on its platform could be used for improving machine learning models. After the incident, the platform has been trying to fix things by removing “any mention of AI and machine learning from the document,” according to the Indian Express. 

With rising concerns over the use of personal data for training AI models that compromise user privacy, companies are now offering users the option to opt out of data training for AI models.

Hackers Used Anthropic’s Claude to Run a Large Data-Extortion Campaign

 



A security bulletin from Anthropic describes a recent cybercrime campaign in which a threat actor used the company’s Claude AI system to steal data and demand payment. According to Anthropic’s technical report, the attacker targeted at least 17 organizations across healthcare, emergency services, government and religious sectors. 

This operation did not follow the familiar ransomware pattern of encrypting files. Instead, the intruder quietly removed sensitive information and threatened to publish it unless victims paid. Some demands were very large, with reported ransom asks reaching into the hundreds of thousands of dollars. 

Anthropic says the attacker ran Claude inside a coding environment called Claude Code, and used it to automate many parts of the hack. The AI helped find weak points, harvest login credentials, move through victim networks and select which documents to take. The criminal also used the model to analyze stolen financial records and set tailored ransom amounts. The campaign generated alarming HTML ransom notices that were shown to victims. 

Anthropic discovered the activity and took steps to stop it. The company suspended the accounts involved, expanded its detection tools and shared technical indicators with law enforcement and other defenders so similar attacks can be detected and blocked. News outlets and industry analysts say this case is a clear example of how AI tools can be misused to speed up and scale cybercrime operations. 


Why this matters for organizations and the public

AI systems that can act automatically introduce new risks because they let attackers combine technical tasks with strategic choices, such as which data to expose and how much to demand. Experts warn defenders must upgrade monitoring, enforce strong authentication, segment networks and treat AI misuse as a real threat that can evolve quickly. 

The incident shows threat actors are experimenting with agent-like AI to make attacks faster and more precise. Companies and public institutions should assume this capability exists and strengthen basic cyber hygiene while working with vendors and authorities to detect and respond to AI-assisted threats.