Search This Blog

Powered by Blogger.

Blog Archive

Labels

Footer About

Footer About

Labels

Showing posts with label AI Agent Security. Show all posts

OpenClaw Security Flaws Expose AI Agents to Hidden Commands and Data Theft Risks

 

Two independent cybersecurity studies published this week have uncovered serious security weaknesses in OpenClaw, a widely used self-hosted AI agent platform. The findings demonstrate how attackers can manipulate AI agents into executing malicious code or leaking sensitive information through seemingly harmless inputs.

Researchers from Imperva and Varonis approached the issue from different angles but reached a similar conclusion: AI agents that trust incoming data and possess broad system access can become powerful attack vectors when exploited.

Hidden Instructions Embedded in Everyday Content

Imperva researchers discovered that OpenClaw could be tricked into processing concealed instructions embedded within shared contacts, vCards, and location pins. These malicious commands were executed by the AI agent without any visible indication to the user.

The issue stemmed from how OpenClaw handled certain message objects before passing them to the large language model (LLM). While content fetched from the web was clearly marked as untrusted, information contained within contacts, vCards, and location labels was inserted directly into prompts without any trust boundary.

According to Imperva researcher Yohann Sillam, this allowed attackers to hide instructions inside fields such as contact names. Since angle brackets are permitted in contact names, the model could not reliably distinguish legitimate information from injected commands.

Only selected fields were transmitted to the model, making them attractive targets. In one example, a shared contact was serialized as <contact: name, number>, allowing attackers to insert malicious instructions within the name field itself. Because messaging apps truncate long contact names, victims often never saw the hidden payload.

The same attack method was also successful through WhatsApp-supported vCards and shared location labels.

During testing against Gemini 3.1 Pro's preview build, hidden instructions successfully convinced the AI agent to download and execute a script hosted on servers controlled by the researchers. Similar attempts using images with embedded instructions failed, likely because AI models have become more resistant to that well-known attack technique.

Imperva warned that OpenClaw's default memory functionality could amplify the threat. A single malicious piece of widely shared content could potentially affect multiple agents if adequate sandboxing protections were absent.

Following responsible disclosure, OpenClaw addressed the issue in version 2026.4.23. The update separates contact names, vCard information, and location labels from the main prompt and places them in an isolated untrusted metadata channel.

Researchers also noted that similar design patterns exist in several other personal AI assistant platforms, suggesting the issue extends beyond OpenClaw alone.

Social Engineering Defeats Technical Safeguards

While Imperva focused on prompt injection, Varonis Threat Labs explored how AI agents respond to social engineering attacks.

Led by researcher Itay Yashar, the Varonis team created an OpenClaw-based agent called Pinchy and connected it to a Gmail inbox filled with realistic business communications and synthetic sensitive information. The researchers then tested the agent using four different phishing scenarios involving Google Gemini 3.1 Pro and OpenAI Codex GPT-5.4.

Varonis distinguishes traditional prompt injection from what it calls "agent phishing." Unlike hidden instructions embedded in content, agent phishing relies on convincing requests delivered through normal communication channels, exploiting the agent's willingness to act before verifying legitimacy.

The tests revealed significant weaknesses.

In one scenario, an email impersonating a team leader named Dan requested urgent staging access during a simulated production emergency. The message originated from an external Gmail account, yet the agent located and forwarded mock AWS IAM access keys, database connection credentials, and SSH details in plain text.

A second phishing attempt used a more routine business request, asking for a weekly customer export supposedly needed for a QBR presentation. The agent responded by sending a synthetic database containing information on 247 enterprise customers, including contact details and contract values.

Notably, these failures occurred despite the agent being configured with instructions to verify sender identities before responding. Researchers observed that urgency successfully bypassed safeguards in one case, while routine business language defeated them in another.

The agent demonstrated stronger performance against technically oriented threats. It interacted with a phishing page designed to steal gift-card credentials but ultimately withheld sensitive information and flagged suspicious behavior. A stricter configuration blocked the page entirely.

Similarly, when presented with a malicious OAuth consent screen disguised as a timesheet application, the agent examined the redirect destination, recognized warning signs, and refused access.

Researchers concluded that AI agents may outperform many users when identifying suspicious URLs and fraudulent login portals. However, they remain vulnerable to social manipulation that exploits helpfulness and trust.

Varonis also observed that OpenAI Codex GPT-5.4 behaved more cautiously than Gemini 3.1 Pro when interacting with external websites or transmitting data. Nevertheless, both models ultimately fell victim to the social-engineering scenarios.

One Core Problem Behind Multiple Attacks

Varonis linked both attack methods to what researcher Simon Willison describes as the "lethal trifecta": an AI system capable of accessing private data, consuming untrusted content, and transmitting information externally.

OpenClaw satisfies all three conditions, making both hidden prompt injections and phishing-based attacks highly effective.

Additional concerns emerged from a separate InfoSec Write-ups analysis. Researchers converted historical OpenClaw security advisories into static-analysis rules and uncovered five additional vulnerabilities affecting integrations with Slack, Discord, Matrix, Zalo, and Microsoft Teams.

Each flaw originated from the same design issue. Channel allowlists were validated using mutable display names rather than permanent identifiers. Attackers could therefore impersonate trusted users simply by changing their display names to match approved accounts.

OpenClaw has since patched these vulnerabilities.

The platform's extensive permissions—including access to files, shell environments, and more than twenty messaging services—have previously prompted warnings regarding prompt injection and data exfiltration risks.

The strongest criticism came from the Dutch data protection authority, the Autoriteit Persoonsgegevens, which advised users and organizations against deploying OpenClaw on systems containing sensitive information due to concerns over data breaches and account compromise.

Recommended Defenses

Organizations using OpenClaw are advised to upgrade immediately to version 2026.4.23 or newer to mitigate the message-object vulnerability identified by Imperva.

However, researchers stress that software updates alone cannot solve the broader trust problem inherent in autonomous AI systems.

Varonis recommends four key safeguards:

  • Treat agent instruction files as strict, version-controlled policies rather than informal guidance.

  • Require approval before agents send messages to unfamiliar recipients, reducing the risk of automated phishing or data leakage.

  • Restrict access to connected systems based on the trustworthiness of the triggering source.

  • Require human review for high-risk actions such as credential sharing, financial transactions, or sensitive data transfers.

Both research teams ultimately advocate the same mindset. Varonis recommends treating AI agents as inexperienced employees with extensive system access but limited judgment, while Imperva describes them as authenticated executors that inherently trust incoming information.

Although vendors continue to introduce patches and protective controls, the fundamental challenge remains unresolved. AI agents derive their usefulness from acting on instructions, processing inputs, and helping users accomplish tasks. Those same characteristics also create opportunities for attackers, and the industry has yet to develop a universal solution.

Critical OpenClaw Flaws Allow Persistent Access and Credential Abuse


 

OpenClaw, a self-hosted AI agent runtime which has gained rapid adoption by enterprises, introduces a new type of security exposure for enterprises as dynamically executed content, external skill integrations, and cloud-based authentication mechanisms are convergent without adequate defensive control mechanisms.

The OpenClaw platform is unlike conventional applications that are constructed using fixed execution logic, as it is capable of accepting untrusted inputs, retrieving and executing third-party code modules, and interacting with connected environments with assigned credentials, effectively extending the trust boundary far beyond the application layer itself. These architectural flexibility and the recently disclosed ClawJacked exploitation technique expose critical weaknesses in authentication handling and token protection within browser-based cloud development environments, according to security researchers. 

It has been demonstrated that malicious web content can exploit active developer sessions to extract sensitive access tokens, thereby granting attackers unauthorized access to source repositories, cloud infrastructures, and privileged enterprise resources. Increasingly, organizations are integrating cloud-native development platforms into their engineering workflows. This disclosure highlights concerns regarding privilege scoping, identity isolation, and other security aspects associated with autonomous AI-powered runtime environments.

A coordinated vulnerability chain, collectively known as the "Claw Chain," was identified by Cyera researchers in response to these concerns, demonstrating how multiple vulnerabilities within OpenClaw can be combined to compromise a system, gain unauthorized access to data, and escalate privileges across affected systems. 

In particular, two vulnerabilities have been assigned CVE-2026-44113 and CVE-2026-2026-44112, which contain time-of-check/time-of-use (TOCTOU) race conditions within the OpenShell managed sandbox backend, which could allow attackers to circumvent sandbox enforcement and interact with files outside of the mounted root. 

In contrast to the first issue, which permits arbitrary write operations which can lead to configuration changes, backdoor installations, and long-term control over compromised hosts, the second issue provides a pathway for unauthorized disclosure of system artifacts, credentials, and sensitive internal data through unauthorized file disclosure. 

Researchers also disclosed CVE-2026-44115, a vulnerability resulting from an incomplete denylist implementation that allows adversaries to conceal shell expansion tokens in heredoc payloads and execute commands that bypass runtime restrictions. 

A fourth vulnerability known as CVE-2026-44118 introduces an improper access control condition in which non-owner loopback clients can impersonate privileged users to manipulate gateway configurations, alter scheduled cron operations, and gain greater control of execution environments through unauthorized use of privileged accounts. These flaws collectively demonstrate the possibility of insufficient isolation, weak privilege boundaries, and inadequate runtime validation mechanisms within modern AI agent infrastructures resulting in a full compromise chain which can sustain stealthy and persistent access despite seemingly isolated weaknesses.

OpenClaw's rapid adoption and permissive architecture have contributed to its rapid transformation from a niche automation framework into a widely deployed AI-driven orchestration environment, further amplifying its security implications.

In late 2025, Austrian engineer Peter Steinberger released a public version of the project that gained wide traction because of its unique capability to provide custom automation capabilities outside of tightly controlled commercial ecosystems. The OpenClaw assistant does not rely on vendor-defined integrations, but rather allows users to develop, modify, and distribute executable "skills."

The result is a large repository containing thousands of automation scenarios developed by the community without centrally managing, categorizing, or validating their security. Due to its “self-hackability” design, where configurations, memory stores, and executable logic are maintained using local Markdown-based structures that can be modified by the user, it has attracted both developer interest and growing scrutiny from security researchers concerned about the absence of hardened trust boundaries. 

It was discovered that hundreds of OpenClaw administrative interfaces were accessible over the internet and did not require authentication. These concerns escalated. Investigations revealed that improperly configured reverse proxies could forward external traffic through localhost-trusted channels, causing the platform to mistakenly treat remote requests as privileged local connections. 

Security researcher Jamieson O'Reilly demonstrated the severity of the issue by gaining access to sensitive assets such as credentials for Anthropic APIs, Telegram bot tokens, Slack environments, and archived conversations. Further research revealed that prompt injection attacks could be used to manipulate the agent to perform unintended behavior by embedding malicious instructions in emails, files, or web content processed by the underlying large language model. 

One such scenario was demonstrated by Matvey Kukuy's delivery of crafted email payloads which coerced the bot to provide private cryptographic keys from the host environment upon receiving instructions to review inbox contents. Several independent experiments have demonstrated the system discloses confidential email data, exposes the contents of home directories via automated shell commands, and searches local storage automatically after receiving psychologically manipulative prompts. 

In aggregate, these incidents illustrate an industry concern that autonomous AI agents operating with wide filesystem visibility, persistent memory, and delegated execution privileges may be highly susceptible to indirect command manipulation when deployed in a manner that does not adhere to strict authentication controls, runtime isolation, and contextual validation controls.

Despite the fact that there is no publicly verified link to any known advanced persistent threat group linking the exploitation of the OpenClaw vulnerabilities, security analysts note that the operational characteristics of the attack are in line with tradecraft commonly utilized in credential theft, browser hijacking, and adversary-in-the-middle intrusion campaigns.

MITRE ATT&CK framework techniques, including T1185 related to browser session hijacking as well as T1557 related to man-in-the-middle attacks, have been identified as parallel techniques, and both of these techniques are frequently used in targeted attacks against enterprise authentication systems and cloud-based environments. There has been a growing concern that financially motivated threat actors and state-aligned operators may incorporate the technique into broader intrusion toolsets due to the availability of publicly available proof-of-concept exploit methods and the relatively low complexity required to weaponize these flaws. 

It was discovered that all versions of OpenClaw and Clawdbot before version 2026.2.2, including all builds up to version 2026.2.1, have been vulnerable to the vulnerability. Researchers stated that in the updated version, unauthorized WebSocket interactions are restricted and authentication checks are enforced on the exposed /cdp interface, which previously permitted unsafe assumptions regarding local trust. 

During the deployment of immediate patches, security teams are advised to monitor for suspicious localhost WebSocket activity, unauthorized browser extension behaviors, and attempts to communicate outbound via ws://127.0.0.1:17892/cdp or infrastructure controlled by known attackers. 

When rapid patching is an operational challenge, experts recommend that the OpenClaw browser extension be temporarily disabled, that host-level firewall restrictions be enforced around local WebSocket services, and that browser session telemetry and endpoint indicators of compromise be continuously reviewed to determine if there has been an unauthorized persistence of credentials or credential interception. 

OpenClaw's vulnerability chain is a reflection of an overall security reckoning taking place in the rapidly expanding AI agent ecosystem, in which convenience-driven automation is outpacing the maturation of defensive safeguards designed to contain it in a rapidly expanding ecosystem. There is an increasing tendency for autonomous assistants to gain access to developer environments, authentication tokens, local storage, messaging platforms, and cloud infrastructure, so that the traditional boundaries between trusted execution and untrusted input are being eroded. 

Platforms with the ability to self-modify, delegate command execution, and persist contextual memory present significant security risks that are fundamentally different from conventional software, particularly when deployed with excessive privileges and inadequate isolation during runtime. 

Despite the fact that OpenClaw's vulnerabilities may be mitigated by patching, access restrictions, and stronger authentication enforcement, the incident emphasizes the larger industry concern that artificial intelligence-driven operational tools may become a high value target for both cybercriminals and advanced intrusion groups in the very near future. 

These findings serve as a reminder that, as organizations adopt autonomous AI systems, security architecture, privilege segmentation, and continuous monitoring must no longer be overlooked.

Chrome Gemini Live Bug Highlighted Serious Privacy Risks for Users


As long as modern web browsers have been around, they have emphasized a strict separation principle, where extensions, web pages, and system-level capabilities operate within carefully defined boundaries. 

Recently, a vulnerability was disclosed in the “Live in Chrome” panel of Google Chrome, a built-in interface for the Gemini assistant that offers agent-like AI capabilities directly within the browser environment that challenged this assumption. 

In a high-severity vulnerability, CVE-2026-0628, security researchers have identified, it is possible for a low-privileged browser extension to inject malicious code into Gemini's side panel and effectively inherit elevated privileges. 

Attackers may be able to evade sensitive functions normally restricted to the assistant by piggybacking on this trusted interface, which includes viewing local files, taking screenshots, and activating the camera or microphone of the device. While the issue was addressed in January's security update, the incident illustrates a broader concern emerging as artificial intelligence-powered browsing tools become more prevalent.

In light of the increasing visibility of user activity and system resources by intelligent assistants, traditional security barriers separating browser components are beginning to blur, creating new and complex opportunities for exploitation. 

The researchers noted that this flaw could have allowed a relatively ordinary browser extension to control the Gemini Live side panel, even though the extension operated with only limited permissions. 

By granting an extension declarativeNetRequest capability, an extension can manipulate network requests in a manner that allows JavaScript to be injected directly into the Gemini privileged interface rather than just in the standard web application pages of Gemini. 

Although request interception within a regular browser tab is considered normal and expected behavior for some extensions, the same activity occurring within the Gemini side panel carried a far greater security risk.

Whenever code executed within this environment inherits the assistant's elevated privileges, it could be able to access local files and directories, capture screenshots of active web pages, or activate the device's camera and microphone without the explicit knowledge of the user. 

According to security analysts, the issue is not merely a conventional extension vulnerability, but is rather the consequence of a fundamental architectural shift occurring within modern browsers as artificial intelligence capabilities become increasingly embedded in the browser. 

According to security researchers, the vulnerability, internally referred to as Glic Jack, short for Gemini Live in Chrome hijack, illustrates how the growing presence of AI-driven functions within browsers can unintentionally lead to new opportunities for abuse. If exploited successfully, the flaw could have allowed an attacker to escalate privileges beyond what would normally be permitted for browser extensions. 

When operating within the trusted assistant interface, malicious code may be able to activate the victim's camera or microphone without permission, take screenshots of arbitrary websites, or obtain sensitive information from local files. Normally, such capabilities are reserved for browser components designed to assist users with advanced automation tasks, but due to this vulnerability, the boundaries were effectively blurred by allowing untrusted code to take the same privileges.

Furthermore, the report highlights that this emerging category of so-called AI or agentic browsers is primarily based on integrated assistants that are capable of monitoring and interacting with user activity as it occurs. There has been a broader shift toward AI-augmented browsing environments, as evidenced by platforms such as Atlas, Comet, and Copilot within Microsoft Edge, as well as Gemini in Google Chrome.

Typically, these platforms feature an integrated assistant panel that summarizes content in real time, automates routine actions, and provides contextual guidance based on the page being viewed. By receiving privileged access to what a user sees and interacts with, the assistant often allows it to perform complex, multi-step tasks across multiple sites and local resources, allowing it to perform these functions. 

CVE-2026-0628, however, presented an unexpected attack surface as a consequence of that same level of integration: malicious code was able to exercise capabilities far beyond those normally available to extensions by compromising the trusted Gemini panel itself.

Chrome 143 was eventually released to address the vulnerability, however the incident underscores a growing security challenge as browsers evolve into intelligent platforms blending traditional web interfaces with deep integrations of artificial intelligence systems. It is noted that as artificial intelligence features become increasingly embedded into everyday browsing tools, the incident reflects an emerging structural challenge. 

Incorporating an agent-driven assistant directly into the browser allows the user to observe page content, interpret context and perform multi-step tasks such as summarizing information, translating text, or completing tasks on their behalf. In order for these systems to provide the level of functionality they require, extensive visibility into the browsing environment and privileged access to browser resources are required.

It is not surprising that AI assistants can be extremely useful productivity tools, but this architecture also creates the possibility of malicious content attempting to manipulate the assistant itself. For instance, a carefully crafted webpage may contain hidden prompts that can influence the behavior of the AI. 

A user could potentially be persuaded-through phishing, social engineering, or deceptive links-to open a phishing-type webpage by the instructions, which could lead the assistant to perform operations which are otherwise restricted by the browser's security model, such as retrieving sensitive data or performing unintended actions, if such instructions are provided.

According to researchers, malicious prompts may be able to persist in more advanced scenarios by affecting the AI assistant's memory or contextual information between sessions in more advanced scenarios. By incorporating instructions into the browsing interaction itself, attackers may attempt to create an indirect persistence scenario that results in the assistant following manipulated directions even after the original webpage has been closed by embedding instructions within the browsing interaction itself. 

In spite of the fact that such techniques remain largely theoretical in many environments, they show how artificial intelligence-driven interfaces create entirely new attack surfaces that traditional browser security models were not designed to address. Analysts have cautioned that integrating assistant panels directly into the browser's privileged environment can also reactivate longstanding web security threats. 

Researchers at Unit 42 have found that placement of AI components within high-trust browser contexts might inadvertently expose them to bugs such as cross-site scripting, privilege escalation, and side-channel attacks. 

Omer Weizman, a security researcher, explained that embedded complex artificial intelligence systems into privileged browser components increases the likelihood that unintended interactions can occur between lower privilege websites or extensions due to logical or implementation oversights. It is therefore important to point out that CVE-2026-0628 serves as a cautionary example of how advances in AI-assisted browsing must be accompanied by equally sophisticated security safeguards in order to ensure that convenience does not compromise the privacy of the user or the integrity of the system. 

There is no doubt that the discovery serves as a timely reminder to security professionals and browser developers regarding the need for a rigorous approach to security design and oversight in the rapid integration of artificial intelligence into core browsing environments. With the increasing capabilities of assistants embedded within platforms, such as Google Chrome, to observe content, interact with system resources, and automate complex workflows through services such as Gemini, the traditional browser trust model has to evolve in order to accommodate these expanded privileges.

Moreover, researchers recommend that organizations and users remain cautious when installing extensions on their browsers, keep browsers up to date with the latest security patches, and treat AI-powered automation features with the same scrutiny as other high-privilege components. It is also important for the industry to ensure that the convenience offered by intelligent assistants does not outpace the safeguards necessary to contain them. 

As the next generation of artificial intelligence-augmented browsers continues to develop, strong isolation boundaries, hardened interfaces, and an anticipatory response to prompts will likely become essential priorities.