Search This Blog

Powered by Blogger.

Blog Archive

Labels

About Me

Showing posts with label Mistral. Show all posts

Jailbroken Mistral And Grok Tools Are Used by Attackers to Build Powerful Malware

 

The latest findings by Cato Networks suggests that a number of jailbroken and uncensored AI tool variations marketed on hacker forums were probably created using well-known commercial large language models like Mistral AI and X's Grok.

A parallel underground market has developed offering to sell more uncensored versions of the technology, while some commercial AI companies have attempted to incorporate safety and security safeguards into their models to prevent them from explicitly coding malware, transmitting detailed instructions for building bombs, or engaging in other malicious behaviours. 

These "WormGPTs," which receive their name from one of the first AI tools that was promoted on underground hacker forums in 2023, are typically assembled from open-source models and other toolkits. They are capable of creating code, finding and analysing vulnerabilities, and then being sold and promoted online. However, two variants promoted on BreachForums in the last year had simpler roots, according to researcher Vitaly Simonovich of Cato Networks.

Named after one of the first AI tools that was promoted on underground hacker forums in 2023, these "WormGPTs" are typically assembled from open-source models and other toolkits and are capable of generating code, searching for and analysing vulnerabilities, and then being sold and marketed online. 

However, Vitaly Simonovich, a researcher at Cato Networks, reveals that two variations promoted on BreachForums in the last year had straightforward origins. “Cato CTRL has discovered previously unreported WormGPT variants that are powered by xAI’s Grok and Mistral AI’s Mixtral,” he wrote. 

One version was accessible via Telegram and was promoted on BreachForums in February. It referred to itself as a “Uncensored Assistant” but otherwise described its function in a positive and uncontroversial manner. After gaining access to both models and beginning his investigation, Simonovich discovered that they were, as promised, mainly unfiltered. 

In addition to other offensive capabilities, the models could create phishing emails and build malware that stole PowerShell credentials on demand. However, he discovered prompt-based guardrails meant to hide one thing: the initial system prompts used to build those models. He was able to evade the constraints by using an LLM jailbreaking technique to access the first 200 tokens processed by the system. The answer identified xAI's Grok as the underlying model that drives the tool.

“It appears to be a wrapper on top of Grok and uses the system prompt to define its character and instruct it to bypass Grok’s guardrails to produce malicious content,” Simonovich added.

Another WormGPT variant, promoted in October 2024 with the subject line "WormGPT / 'Hacking' & UNCENSORED AI," was described as an artificial intelligence-based language model focused on "cyber security and hacking issues." The seller stated that the tools give customers "access to information about how cyber attacks are carried out, how to detect vulnerabilities, or how to take defensive measures," but emphasised that neither they nor the product accept legal responsibility for the user's actions.