Researchers Expose AI Prompt Injection Attack Hidden in Images

 

Researchers have unveiled a new type of cyberattack that can steal sensitive user data by embedding hidden prompts inside images processed by AI platforms. These malicious instructions remain invisible to the human eye but become detectable once the images are downscaled using common resampling techniques before being sent to a large language model (LLM).

The technique, designed by Trail of Bits experts Kikimora Morozova and Suha Sabi Hussain, builds on earlier research from a 2020 USENIX paper by TU Braunschweig, which first proposed the concept of image-scaling attacks in machine learning systems.

Typically, when users upload pictures into AI tools, the images are automatically reduced in quality for efficiency and cost optimization. Depending on the resampling method—such as nearest neighbor, bilinear, or bicubic interpolation—aliasing artifacts can emerge, unintentionally revealing hidden patterns if the source image was crafted with this purpose in mind.

In one demonstration by Trail of Bits, carefully engineered dark areas within a malicious image shifted colors when processed through bicubic downscaling. This transformation exposed black text that the AI system interpreted as additional user instructions. While everything appeared normal to the end user, the model silently executed these hidden commands, potentially leaking data or performing harmful tasks.

In practice, the team showed how this vulnerability could be exploited in Gemini CLI, where hidden prompts enabled the extraction of Google Calendar data to an external email address. With Zapier MCP configured to trust=True, the tool calls were automatically approved without requiring user consent.

The researchers emphasized that the success of such attacks depends on tailoring the malicious image to the specific downscaling algorithm used by each AI system. Their testing confirmed the method’s effectiveness against:

  1. Google Gemini CLI
  2. Vertex AI Studio (Gemini backend)
  3. Gemini’s web interface
  4. Gemini API via llm CLI
  5. Google Assistant on Android
  6. Genspark

Given the broad scope of this vulnerability, the team developed Anamorpher, an open-source tool (currently in beta) that can generate attack-ready images aligned with multiple downscaling methods.

To defend against this threat, Trail of Bits recommends that AI platforms enforce image dimension limits, provide a preview of the downscaled output before submission to an LLM, and require explicit user approval for sensitive tool calls—especially if text is detected in images.

"The strongest defense, however, is to implement secure design patterns and systematic defenses that mitigate impactful prompt injection beyond multi-modal prompt injection," the researchers said, pointing to their earlier paper on robust LLM design strategies.

Popular Posts