Researchers at Google DeepMind have outlined a growing but less visible risk in artificial intelligence deployment, the possibility that the internet itself can be used to manipulate autonomous AI agents.
In a recent paper titled “AI Agent Traps,” the researchers describe how online content can be deliberately designed to mislead, control or exploit AI systems as they browse websites, read information and take actions. The study focuses not on flaws inside the models, but on the environments these agents operate in.
The issue is becoming more urgent as companies move toward deploying AI agents that can independently handle tasks such as booking travel, managing emails, executing transactions and writing code. At the same time, malicious actors are increasingly experimenting with AI for cyberattacks.
OpenAI has also acknowledged that one of the key weaknesses involved, prompt injection, may never be fully eliminated.
The paper groups these risks into six broad categories.
One category involves hidden instructions embedded in web pages. These can be placed in parts of a page that humans do not see, such as HTML comments, invisible elements or metadata. While a user sees normal content, an AI agent may read and follow these concealed commands. In more advanced cases, websites can detect when an AI agent is visiting and deliver a different version of the page tailored to influence its behavior.
Another category focuses on how language shapes an agent’s interpretation. Pages filled with persuasive or authoritative sounding phrases can subtly steer an agent’s conclusions. In some cases, harmful instructions are disguised as educational or hypothetical content, which can bypass a model’s safety checks. The researchers also describe a feedback loop where descriptions of an AI’s personality circulate online, are later absorbed by models and begin to influence how those systems behave.
A third type of risk targets an agent’s memory. If false or manipulated information is inserted into the data sources an agent relies on, the system may treat that information as fact. Even a small number of carefully placed documents can affect how the agent responds to specific topics.
Other attacks focus directly on controlling an agent’s actions. Malicious instructions embedded in ordinary web pages can override safety safeguards once processed by the agent.
In some experiments, attackers were able to trick agents into retrieving sensitive data, such as local files or passwords, and sending it to external destinations at high success rates.
The researchers also highlight risks that emerge at scale. Instead of targeting a single system, some attacks aim to influence many agents at once. They draw comparisons to the Flash Crash, where automated trading systems amplified a single event into a large market disruption.
A similar dynamic could occur if multiple AI agents respond simultaneously to false or manipulated information.
Another category involves the human users overseeing these systems. Outputs can be designed to appear credible and technical, increasing the likelihood that a person approves an action without fully understanding the risks.
In one example, harmful instructions were presented as legitimate troubleshooting steps, making them easier to accept.
To address these risks, the researchers outline several areas for improvement. On the technical side, they suggest training models to better recognize adversarial inputs, as well as deploying systems that monitor both incoming data and outgoing actions.
At a broader level, they propose standards that allow websites to signal which content is intended for AI systems, along with reputation mechanisms to assess the trustworthiness of sources.
The paper also points to unresolved legal questions. If an AI agent carries out a harmful action after being manipulated, it is unclear who should be held responsible.
The researchers describe this as an “accountability gap” that will need to be addressed before such systems can be widely deployed in regulated sectors.
The study does not present a complete solution. Instead, it argues that the industry lacks a clear, shared understanding of the problem. Without that, the researchers suggest, efforts to secure AI systems may continue to focus on the wrong areas.
