Search This Blog

Powered by Blogger.

Blog Archive

Labels

Footer About

Footer About

Labels

OpenClaw Security Flaws Expose AI Agents to Hidden Commands and Data Theft Risks

Imperva researchers discovered that OpenClaw could be tricked into processing concealed instructions embedded within shared contacts.

 

Two independent cybersecurity studies published this week have uncovered serious security weaknesses in OpenClaw, a widely used self-hosted AI agent platform. The findings demonstrate how attackers can manipulate AI agents into executing malicious code or leaking sensitive information through seemingly harmless inputs.

Researchers from Imperva and Varonis approached the issue from different angles but reached a similar conclusion: AI agents that trust incoming data and possess broad system access can become powerful attack vectors when exploited.

Hidden Instructions Embedded in Everyday Content

Imperva researchers discovered that OpenClaw could be tricked into processing concealed instructions embedded within shared contacts, vCards, and location pins. These malicious commands were executed by the AI agent without any visible indication to the user.

The issue stemmed from how OpenClaw handled certain message objects before passing them to the large language model (LLM). While content fetched from the web was clearly marked as untrusted, information contained within contacts, vCards, and location labels was inserted directly into prompts without any trust boundary.

According to Imperva researcher Yohann Sillam, this allowed attackers to hide instructions inside fields such as contact names. Since angle brackets are permitted in contact names, the model could not reliably distinguish legitimate information from injected commands.

Only selected fields were transmitted to the model, making them attractive targets. In one example, a shared contact was serialized as <contact: name, number>, allowing attackers to insert malicious instructions within the name field itself. Because messaging apps truncate long contact names, victims often never saw the hidden payload.

The same attack method was also successful through WhatsApp-supported vCards and shared location labels.

During testing against Gemini 3.1 Pro's preview build, hidden instructions successfully convinced the AI agent to download and execute a script hosted on servers controlled by the researchers. Similar attempts using images with embedded instructions failed, likely because AI models have become more resistant to that well-known attack technique.

Imperva warned that OpenClaw's default memory functionality could amplify the threat. A single malicious piece of widely shared content could potentially affect multiple agents if adequate sandboxing protections were absent.

Following responsible disclosure, OpenClaw addressed the issue in version 2026.4.23. The update separates contact names, vCard information, and location labels from the main prompt and places them in an isolated untrusted metadata channel.

Researchers also noted that similar design patterns exist in several other personal AI assistant platforms, suggesting the issue extends beyond OpenClaw alone.

Social Engineering Defeats Technical Safeguards

While Imperva focused on prompt injection, Varonis Threat Labs explored how AI agents respond to social engineering attacks.

Led by researcher Itay Yashar, the Varonis team created an OpenClaw-based agent called Pinchy and connected it to a Gmail inbox filled with realistic business communications and synthetic sensitive information. The researchers then tested the agent using four different phishing scenarios involving Google Gemini 3.1 Pro and OpenAI Codex GPT-5.4.

Varonis distinguishes traditional prompt injection from what it calls "agent phishing." Unlike hidden instructions embedded in content, agent phishing relies on convincing requests delivered through normal communication channels, exploiting the agent's willingness to act before verifying legitimacy.

The tests revealed significant weaknesses.

In one scenario, an email impersonating a team leader named Dan requested urgent staging access during a simulated production emergency. The message originated from an external Gmail account, yet the agent located and forwarded mock AWS IAM access keys, database connection credentials, and SSH details in plain text.

A second phishing attempt used a more routine business request, asking for a weekly customer export supposedly needed for a QBR presentation. The agent responded by sending a synthetic database containing information on 247 enterprise customers, including contact details and contract values.

Notably, these failures occurred despite the agent being configured with instructions to verify sender identities before responding. Researchers observed that urgency successfully bypassed safeguards in one case, while routine business language defeated them in another.

The agent demonstrated stronger performance against technically oriented threats. It interacted with a phishing page designed to steal gift-card credentials but ultimately withheld sensitive information and flagged suspicious behavior. A stricter configuration blocked the page entirely.

Similarly, when presented with a malicious OAuth consent screen disguised as a timesheet application, the agent examined the redirect destination, recognized warning signs, and refused access.

Researchers concluded that AI agents may outperform many users when identifying suspicious URLs and fraudulent login portals. However, they remain vulnerable to social manipulation that exploits helpfulness and trust.

Varonis also observed that OpenAI Codex GPT-5.4 behaved more cautiously than Gemini 3.1 Pro when interacting with external websites or transmitting data. Nevertheless, both models ultimately fell victim to the social-engineering scenarios.

One Core Problem Behind Multiple Attacks

Varonis linked both attack methods to what researcher Simon Willison describes as the "lethal trifecta": an AI system capable of accessing private data, consuming untrusted content, and transmitting information externally.

OpenClaw satisfies all three conditions, making both hidden prompt injections and phishing-based attacks highly effective.

Additional concerns emerged from a separate InfoSec Write-ups analysis. Researchers converted historical OpenClaw security advisories into static-analysis rules and uncovered five additional vulnerabilities affecting integrations with Slack, Discord, Matrix, Zalo, and Microsoft Teams.

Each flaw originated from the same design issue. Channel allowlists were validated using mutable display names rather than permanent identifiers. Attackers could therefore impersonate trusted users simply by changing their display names to match approved accounts.

OpenClaw has since patched these vulnerabilities.

The platform's extensive permissions—including access to files, shell environments, and more than twenty messaging services—have previously prompted warnings regarding prompt injection and data exfiltration risks.

The strongest criticism came from the Dutch data protection authority, the Autoriteit Persoonsgegevens, which advised users and organizations against deploying OpenClaw on systems containing sensitive information due to concerns over data breaches and account compromise.

Recommended Defenses

Organizations using OpenClaw are advised to upgrade immediately to version 2026.4.23 or newer to mitigate the message-object vulnerability identified by Imperva.

However, researchers stress that software updates alone cannot solve the broader trust problem inherent in autonomous AI systems.

Varonis recommends four key safeguards:

  • Treat agent instruction files as strict, version-controlled policies rather than informal guidance.

  • Require approval before agents send messages to unfamiliar recipients, reducing the risk of automated phishing or data leakage.

  • Restrict access to connected systems based on the trustworthiness of the triggering source.

  • Require human review for high-risk actions such as credential sharing, financial transactions, or sensitive data transfers.

Both research teams ultimately advocate the same mindset. Varonis recommends treating AI agents as inexperienced employees with extensive system access but limited judgment, while Imperva describes them as authenticated executors that inherently trust incoming information.

Although vendors continue to introduce patches and protective controls, the fundamental challenge remains unresolved. AI agents derive their usefulness from acting on instructions, processing inputs, and helping users accomplish tasks. Those same characteristics also create opportunities for attackers, and the industry has yet to develop a universal solution.

Share it:

AI Agent Security

AI Phishing Attacks

OpenClaw security vulnerability

prompt injection attack

Vulnerabilities and Exploits