Anthropic Research Indicates That AI Algorithms May Turn Into "Sleeper Cell" Backdoors

While AI tools offer companies and online users novel avenues, they also have the potential to significantly boost the accessibility and potency of certain forms of illegal activity and crimes. For example, take the latest study that revealed large language models can be turned into malicious backdoors, which have the potential to cause quite a bit of mayhem for users.

The study was released by Anthropic, the AI business that created the popular chatbot Claude and has funding from Google and Amazon. Anthropic researchers claim in their research that AI algorithms are susceptible to being transformed into what are essentially "sleeper cells." Such cells could look innocuous, but if specific requirements are met, they might be designed to act maliciously, such as adding weak code to a codebase.

For example, the researchers created a scenario in which an LLM is configured to function normally in 2023, but when 2024 arrives, the malicious "sleeper" suddenly wakes up and starts generating malicious code. The research suggests that such programs could possibly be designed to exhibit negative behaviour in response to particular cues.

Given that AI programs have grown immensely popular among software authors over the past year, the findings of this study appear to be quite alarming. It's easy to picture a scenario in which a coder uses a popular, open-source algorithm to help them with their development tasks, only for it to turn malicious at some point and start making their product less secure and hackable.

“We believe that our code vulnerability insertion backdoor provides a minimum viable example of a real potential risk...Such a sudden increase in the rate of vulnerabilities could result in the accidental deployment of vulnerable model-written code even in cases where safeguards prior to the sudden increase were sufficient,” the company stated.

If it appears strange that an AI company would release research demonstrating how its own technology can be so horribly exploited, consider that the AI models most vulnerable to this type of "poisoning" are open source—that is, code that is flexible, non-proprietary, and easily shared and adapted online. Notably, Anthropic is closed-source. It is also a founding member of the Frontier Model Forum, a group of AI companies whose products are primarily closed-source and have campaigned for stricter "safety" rules in AI development.

Search This Blog

Sections

Popular Posts

Blog Archive

Labels

Report Abuse

About Me

Footer About

Labels

Showing result(s) for

Popular Posts

Pages

Anthropic Research Indicates That AI Algorithms May Turn Into "Sleeper Cell" Backdoors

Footer About

Search This Blog

Sections

Popular Posts

Blog Archive

Labels

Report Abuse

About Me

Footer About

Labels

Showing result(s) for

Popular Posts

Pages

Menu Item

Next

Newer Post

Previous

Older Post

Artificial Intelligence

LLM

Malicious Backdoor

Technology

Threat Landscape