Search This Blog

Powered by Blogger.

Blog Archive

Labels

Showing posts with label Jailbreaking. Show all posts

Breaking Boundaries: Language Models Empower Each Other in Automated Jailbreaking

 


Increasing usage of large language models in industry has resulted in a flood of research activity that aims to find out whether LLMs have a tendency to generate hurtful or biased content when persuaded in a particular way or when using specific inputs.

It has just been published in a new paper from researchers at Robust Intelligence and Yale University that describes the latest development in the field of black box LLMs, which promises to enable even the most state-of-the-art black box LLMs to escape guardrails and generate toxic output by fully automating the process. 

A new preprint study shows just how to trick AIs into giving up some of the secrets they have been keeping from users that can be dangerous in the future. Currently, chatbots have built-in restrictions to keep them from revealing anything dangerous to users. As most people are aware, today's machines can act as fictional characters or mimic specific personalities by feigning to have specific personalities or posing as specific roles. 

Using that ability, the new study was able to enlist the assistance of a chatbot which has been used extensively in artificial intelligence to get the job done. Taking advantage of this assistant, the researchers directed him to work on prompts that would be able to "jailbreak" other chatbots-destroying the guardrails that had been embedded into them. 

The term "black box LLM" refers to a large language model, such as the one behind ChatGPT, which is not publicly available regarding its architecture, datasets, training methodologies, and other details of development. A new method, which has been dubbed Tree of Attacks with Pruning (TAP) by the researchers, consists of using a nonaligned LLM to "jailbreak" another aligned LLM in order to break through its guardrails and to reach its goals quite swiftly and effectively. 

It should be noted that the objective of an LLM designed for alignment such as the one behind ChatGPT and other AI chatbots is explicitly to minimize the potential for harm and would not, for instance, provide information on how to build a bomb in response to a request for such information. A non-aligned LLM is optimized in order to increase accuracy as well as to contain fewer constraints compared to an aligned LLM. 

With the help of models like ChatGPT, users have been delighted by the ability of these models to process outside prompts and (in some cases) produce organized, actionable responses based on massive data sets that have been collected in the past. As a result, there are a number of possible uses for artificial intelligence that have expanded our collective understanding of what is possible in the age of artificial intelligence. 

When these LLMs began to become widely used by the public, however, they started causing a host of problems as soon as they became widely known. The problems began to arise from hallucination (the act of inventing facts, studies, or events in an elaborate manner) as well as inaccurate information provided to the opposing party. Provide accurate (but objectionable, dangerous, or harmful) answers to questions like, "How do I build a bomb?" or "How can I write a program to exploit this vulnerability?" LLM-based AI systems can be attacked using a variety of attack tactics, and this is true for several different kinds of AI. 

A prompt attack can be defined as the act of using prompts to make the model produce answers that, by definition, it should not produce in theory. The problem with AI models is that they can be backdoored (forced to generate incorrect outputs when triggered) and their training data can be extracted - or poisoned - to generate incorrect outputs. 

In situations of adversarial examples, a model can be "confused" with unexpected (but predictable) results due to inputs generated by adversarial examples. Researchers from Yale and Robust Intelligence have developed a machine learning technique that uses automated adversarial adversarial learning to defeat that last category of attacks by overriding the control structures (“guardrails”) that normally prevent them from achieving success. 

There are many LLMs on the market that feature AI to generate useful content at scale, and GPT-4 is one such example. As a result, if these capabilities are not checked, those same capabilities may also be used for harmful purposes. Recent research has led to the development of techniques designed to retool LLMs into malicious systems used to mislead, contaminate, and commit fraud in an attempt to increase their power. 

There is also an opportunity for misuse of open-source LLMs that lack safety measures, which can be run automatically on a local machine without any restrictions. In the case of GPT-Neo, for instance, it can be considered a major security risk when it is not used under one's control. 

An LLM-based AI system can be attacked to produce results for a variety of reasons, and this is true for many different kinds of AI systems. One such attack is prompting the model with questions which are intended to induce it to produce answered in a way it should not be able to do based on the model's definition.

Attackers Use Underground Hacking Forum to Strip Activation Lock from iPhones

 

Checkm8.info, an underground hacking forum is offering users a convenient way to strip ‘activation lock’ from iPhones with its pay-for-hacking service. However, iOS security analysts believe the hackers are tricking people to remove protections from stolen iPhones. 

Activation lock essentially prohibits anyone from activating the device until the owner enters the requested credentials. The lock is enabled when the administrator sets up Find My, the Apple service that allows people to track the location of their iPhone, Mac, or Apple Watch. 

“Activation Lock,” a text popup across the iPhone’s screen read. “This iPhone is linked to an Apple ID. Enter the Apple ID and password that were used to set up this iPhone.” 

The hackers are using checkra1n, an open-source jailbreaking tool published in 2019. Checkra1n employs an exploit called checkm8 designed by the developer known as Axi0mX. According to checkm8.info’s website, Checkm8 is only applicable for devices running iOS versions 12 to 14.8.1 because the latest iPhones have updated bootrom code that is not susceptible to checkm8. 

A video posted on checkm8.info’s website shows how smoothly the process of using the checkm8.info tool is. A user only needs to download the software, install it, open it up, and finally plug it into Mac or PC. Subsequently, the site charges $69.99 per license. 

“Done! You have successfully bypassed the iCloud activation lock on your device,” the video’s female narrator explains. 

Additionally, Checkm8.info provides a service called “Bypass iPhone Passcode.” This service tool is not identical to established iPhone unlocking services such as Cellebrite and GrayShift. “This service restores the device to factory settings and activates it as a new device using a saved activation ticket from the system. So basically, this method has nothing with brute-forcing or user data leak. Passcode phrase is a common name used by other tools for this service so we decided to give it the same name,” the checkm8.info administrator explained. 

Three years ago in 2019, security researcher axi0mX uncovered checkm8, an exploit that enabled the jailbreak of millions of iOS devices. The exploit lay in the bootrom of the compromised devices. Before 2019, the last iOS bootrom-based jailbreak was published way back in 2009, making the Checkm8 exploit even more astonishing feet since many believed the hardware avenue for rooting devices had long been shut down closed.