Search This Blog

Powered by Blogger.

Blog Archive

Labels

Footer About

Footer About

Labels

Showing posts with label zealot. Show all posts

Researchers Show How ChatGPT Summaries Could Be Used for Phishing Attacks

 


Researchers have identified a technique that could allow malicious content embedded within a web page to appear inside ChatGPT responses, creating an opportunity for phishing, tracking, and social-engineering attacks through a platform users generally regard as trustworthy.

The attack method, named "ChatGPhish" by cybersecurity firm Permiso Security, focuses on how ChatGPT handles Markdown-formatted content when summarizing information from external websites. Markdown is a commonly used formatting language that allows web content to include elements such as hyperlinks and images.

According to Permiso Security researcher Andi Ahmeti, ChatGPT's web interface trusts Markdown links and image URLs originating from third-party pages that users ask the assistant to summarize. When a response is generated, the platform can automatically retrieve those images and present hyperlinks as active, clickable elements within the chatbot's interface.

In a scenario outlined by the researchers, an attacker could place a small hidden payload within a web page. If a user later asks ChatGPT to summarize that page, the embedded content may become part of the model's processing context. During response rendering, attacker-controlled images could be automatically requested, potentially exposing information such as the visitor's IP address, browser User-Agent string, and Referer data.

The researchers also found that links embedded in a manipulated page could appear as legitimate clickable items inside the AI-generated summary. Beyond directing users to phishing destinations, attackers could display fabricated security notifications, account-warning messages designed to imitate system alerts, or QR codes hosted on attacker-controlled infrastructure such as an Amazon S3 bucket. A victim scanning such a code with a mobile device could be redirected to a malicious destination, bypassing certain desktop-based URL filtering mechanisms and enterprise security controls.

The research adds to a growing body of evidence showing that AI-powered summarization tools can become unintended delivery channels for attacker instructions. Earlier this year, Permiso Security disclosed a separate attack involving Microsoft Copilot, where specially crafted instructions hidden inside an email influenced the output generated by the AI assistant. That technique was classified as a cross-prompt injection attack, also known as indirect prompt injection.

According to the researchers, the primary issue is not simply that prompt injection is possible. The more significant concern is how the manipulated content is ultimately presented to the user. A standard web page summarized by ChatGPT can cause phishing links, deceptive warnings, QR codes, and remotely hosted content to be displayed directly inside the assistant's interface, giving attacker-controlled material an appearance of legitimacy.

As AI assistants become common tools for workplace research, document review, and information gathering, this behavior introduces a new risk. Any web page processed by an employee could potentially contain hidden instructions or malicious content capable of influencing both the generated summary and the way that information is displayed.

Permiso Security noted that this shifts phishing activity beyond traditional delivery methods. Users no longer need to open a suspicious attachment or interact with an obviously fraudulent email. In some cases, simply asking an AI assistant to summarize a webpage may expose them to attacker-controlled content.

The disclosure arrives alongside research from Adversa AI detailing two attack techniques aimed at AI coding assistants and agentic development tools. The first, known as SymJack, allows a malicious code repository to achieve remote code execution through an AI-powered coding assistant.

According to Adversa AI researcher Rony Utevsky, the attack relies on convincing the AI assistant to perform what appears to be a harmless file-copy operation. The destination, however, is a symbolic link pointing to the assistant's own configuration file. As a result, attacker-controlled content is written into the configuration. When the assistant is restarted, a malicious Model Context Protocol (MCP) server is launched and executes arbitrary code using the victim's privileges.

The second technique, called TrustFall, uses a repository containing a malicious MCP server together with configuration settings that automatically approve its execution. A developer only needs to clone or open the repository in an AI coding environment and accept a folder-trust prompt. Once that action is taken, the attacker-controlled MCP server can start automatically without requiring additional tool approval, running with the same operating-system permissions as the developer.

Adversa AI explained that a victim who clones the repository, launches Claude, and accepts the generic trust prompt effectively allows the malicious MCP server to start as a native process on the machine. The payload executes immediately when the server starts, before additional prompts or tool requests occur.

The ChatGPhish findings emerge amid a steady stream of research examining weaknesses in modern AI systems, coding agents, and autonomous workflows.

Researchers recently described a jailbreak method called Involuntary In-Context Learning (IICL), which exploits the tension between a model's contextual learning behavior and its safety mechanisms to bypass protections in GPT-5.4.

Separate research from Cisco found that many AI security evaluations fail to reflect how real-world attackers operate. Rather than relying on a single prompt, attackers often use multiple interactions, gradually changing their wording, adopting different personas, and breaking objectives into smaller steps. Cisco argued that single-turn testing overlooks these techniques because real attacks frequently unfold across extended conversations.

Additional research has uncovered a vulnerability affecting Anthropic Claude Code in which a user-level configuration file, "~/.claude.json," can be altered through a rogue npm package. The attack enables modification of MCP endpoints and can place an attacker between Claude Code and an OAuth-protected MCP server, creating an opportunity to capture authentication tokens used to access downstream software-as-a-service platforms.

Researchers have also documented a technique involving OpenClaw skills that appear harmless during installation but later retrieve remote updates. In one scenario, attackers can influence an AI agent through workspace files after instructing users to append specific content to a file called HEARTBEAT.md during setup.

Another study demonstrated how hidden text embedded inside phishing emails can manipulate AI-based email security products. Attackers concealed text taken from legitimate newsletters and romance novels to make malicious messages appear benign to automated filtering systems.

LayerX researchers separately disclosed a flaw known as ClaudeBleed affecting Claude's Chrome extension. According to the company, any browser extension, including one without elevated permissions, could communicate with Claude's language model through the extension's content script because the code does not adequately verify the source of incoming instructions. This could allow another extension to issue commands and trigger actions through the AI assistant.

Cisco researchers also examined typographic prompt injection attacks against vision-language models. In these attacks, adversarial text is embedded inside images. The manipulated image may appear unreadable or resemble visual noise to humans and OCR-based filters while remaining interpretable to the target AI model.

Other recently disclosed vulnerabilities include flaws in Microsoft Semantic Kernel, tracked as CVE-2026-25592 and CVE-2026-26030, which researchers said could allow prompt-injection attacks to progress into host-level remote code execution.

Researchers additionally described the Neural Exec attack and abuse of the Unicode right-to-left-override function to bypass safety mechanisms protecting Apple's local AI models. The issue has since been addressed in iOS 26.4 and macOS 26.4.

A separate indirect prompt-injection vulnerability known as WebPromptTrap affected BrowserOS, an open-source agentic browser. The technique relied on hidden instructions embedded in an otherwise legitimate article to influence an AI-generated summary and persuade users to approve an authorization request. The issue was patched in BrowserOS version 0.32.0.

Research into the broader AI-agent ecosystem has uncovered persistent security weaknesses. An audit covering 3,984 skills published through ClawHub and skills.sh found that 534 skills, representing 13.4% of the total, contained at least one critical security issue. Researchers also identified 1,467 skills with broader weaknesses, including malware distribution risks, prompt-injection opportunities, exposed secrets, hard-coded API credentials, insecure handling of authentication data, and unsafe exposure to third-party content.

Additional studies identified attacks against NemoClaw, NVIDIA's reference framework for securing OpenClaw agents. Researchers demonstrated methods for extracting OpenClaw data through the platform's default sandbox configuration using either a malicious GitHub repository or a compromised npm package.

Security researchers are increasingly examining how advances in AI capability could affect offensive cyber operations. According to researchers at Palo Alto Networks Unit 42, more capable AI models could allow attackers to exploit both newly discovered and previously known vulnerabilities at a scale, speed, and level of automation that has traditionally required specialized expertise.

Last month, Unit 42 presented a proof-of-concept AI agent called Zealot that was capable of carrying out cloud attack operations with limited human involvement. The system chained together reconnaissance, exploitation, privilege escalation, and data-exfiltration activities by leveraging known weaknesses and misconfigurations.

Researchers argue that cloud environments are particularly susceptible to this type of automation because most administrative functions are accessible through APIs, multiple discovery mechanisms exist for identifying resources, configuration errors remain common, and access control often depends heavily on credentials.

According to Unit 42 researchers Yahav Festinger and Chen Doytshman, current large language models are already capable of coordinating reconnaissance, exploitation, privilege escalation, and data theft activities with relatively little human guidance. The techniques themselves are not necessarily new. What is changing is the speed and scale at which those established attack patterns can now be executed through AI-assisted automation.