Increasing reliance on large language model infrastructure deployed locally has prompted a renewed focus on self-hosted artificial intelligence platforms' security posture after researchers revealed a critical vulnerability in Ollama that could lead to remote attackers gaining access to sensitive process memory without authorization.
CVE-2026-7482, a security vulnerability with a CVSS severity score of 9,1 describes an out-of-bounds read vulnerability that can expose large portions of memory associated with running Ollama processes, including user prompts, system instructions, configuration data, and environment variables, as a result of an out-of-bounds read.
Because Ollama is widely used as a local inference platform for open-source large language models such as Llama and Mistral, the disclosure has raised significant concerns among artificial intelligence and cybersecurity communities.
By using their own infrastructure rather than using external cloud providers, organizations and developers are able to run AI workloads directly.
There are approximately 170,000 stars on GitHub, over 100 million Docker Hub downloads, and deployment footprints on nearly 300,000 servers accessible through the internet, which highlight the growing security risks associated with rapidly adopted artificial intelligence ecosystems as well as the sensitive operational data they process.
Cyera has identified the vulnerability, dubbed Bleeding Llama, to originate from an insecure handling of GGUF model files within Ollama, in which the server implicitly trusts tensor dimension values embedded inside uploaded models without performing adequate boundary validations. Through this design weakness, an application can manipulate memory access operations during model processing by creating specially crafted GGUF files, forcing it to read data outside the application's intended memory buffers and incorporating fragments of sensitive runtime information into model artifacts generated by the application.
It is clear that the underlying problem is linked to the GPT-Generated Unified Format (GGUF), which is widely used to package and distribute large language models that can be efficiently executed locally. Similar to PyTorch's .pt and .pth models, safetensors, and ONNX models, GGUF enables developers to store and execute open-source models directly on local computers without the need for external resources.
The vulnerability is identified as a result of the manner Ollama processes these files during model creation, specifically by using Go's unsafe package within a function known as WriteTo(). The implementation inadvertently exposes the heap to out-of-bounds reads when malicious tensor metadata is supplied because it relies on low-level memory operations that bypass standard language safety protections.
It is possible to exploit this vulnerability by crafting a GGUF file with intentionally oversized tensor shape values and sending it to an exposed Ollama instance via the /api/create endpoint in an attack scenario. By manipulating dimensions, the application is forced to access memory regions outside the allocated boundaries during parsing and model generation. As a result, sensitive information contained within the Ollama process space is unintentionally disclosed.
According to researchers, exposed memory may contain environment variables, authentication tokens, API credentials, system prompts, as well as portions of concurrent user interactions processed by the same instance. CVE-2026-7482 functions differently from conventional exploitation techniques, as it is a silent disclosure mechanism preventing data leakage without crashes, visible failures, or immediate forensic indicators, as opposed to conventional exploitation techniques.
In internet-accessible deployments, the attack chain itself is considered relatively straightforward, significantly reducing the difficulty of remote exploitation. In order to manipulate Ollama into harvesting unintended memory regions during parsing and artifact generation, attackers can upload malicious GGUF models via the unauthenticated /api/create endpoint. These manipulated tensor dimensions then coerce Ollama into uploading the malicious model.
An artifact containing sensitive process data can then be exported through the unauthenticated /api/push endpoint, allowing covert exfiltration of stolen information.
According to security researchers, since many Ollama instances remain directly exposed to the Internet without adequate access restrictions, the vulnerability poses a particularly serious risk to enterprises and developers using local AI infrastructure assuming self-hosted deployments provide a higher degree of data isolation.
Analysts warn that the “Bleeding Llama” vulnerability significantly increases the risks associated with self-hosted artificial intelligence infrastructure since unauthenticated attackers will have direct access to the active memory space of the Ollama process without the need for prior access or user involvement.
In combination with the widespread adoption by enterprises and developers of the platform, the simplicity of exploitation transforms the issue from a single software defect into a large-scale exposure concern for organizations whose sensitive workloads rely on locally deployed language models.
In contrast to conventional vulnerabilities causing service disruption, memory disclosure flaws of this nature are capable of silently compromising valuable operational and proprietary data for extended periods of time.
A research study indicates that attackers could potentially extract confidential model weights, allowing for intellectual property theft or reconstruction of customized AI systems internally, as well as gathering sensitive prompts, business data, and user inputs processed by active models.
In addition to infrastructure details and authentication tokens, exposed memory may reveal API credentials, runtime configuration information, and API credentials that could facilitate further network compromises. As well as the immediate technical risks, such incidents are also likely to adversely affect organizations increasingly integrating artificial intelligence systems into critical operations, especially those where privacy and local data control are important components of their deployments.
Security teams across the industry have actively tracked this issue despite the absence of an official CVE identification number, which initially complicated the vulnerability disclosure process.
According to defenders, organizations should prioritize rapid mitigation strategies, including immediately upgrading to patched Ollama releases once they are available, limiting public network exposure, implementing strict firewall and access control policies, and ensuring that the service operates under least privilege conditions to reduce access after a compromise has occurred.
Further, security professionals recommend that network anomalies be monitored continuously, infrastructure audits for misconfigurations be conducted, and deployment within isolated or segmented networks in highly sensitive environments to reduce the attack surface of internet-accessible artificial intelligence systems.
Furthermore, Striga researchers have identified two separate vulnerabilities that can be chained to result in persistent code execution within the Windows implementation of Ollama, compounding the disclosure surrounding "Bleeding Llama".
Researchers have determined that the Windows desktop client is automatically launched during login through the Windows Startup folder and listens locally at 127.0.0.1:11434.
After checking for updates from the /api/update endpoint periodically, the pending installers are executed the next time the application is started.
It is characterized by a combination of a missing signature verification flaw - CVE-2026-42288 - and a path traversal vulnerability - CVE-2026-42249 - both of which have been assigned CVSS scores of 7.7.
According to researchers, the installer signatures are not validated before execution and staging paths are constructed directly from HTTP response headers without proper sanitization, enabling malicious files to be written to locations controlled by the attacker.
The flaws may allow arbitrary executables to be silently deployed and executed during system login in scenarios in which an adversary could manipulate update responses, including redirecting the OLLAMA_UPDATE_URL configuration to a controlled HTTP server, while automatic updates remain enabled by default.
The signature verification issue alone may allow temporary code to be executed from the staging directory, but when combined with a path traversal weakness, persistence can be achieved by writing payloads outside the expected update path, preventing subsequent legitimate updates from overwriting them.
Ollama for Windows versions 0.12.10 through 0.17.5 are affected by this vulnerability and should be disabled automatically by Microsoft. Users are advised to remove Ollama shortcuts from the Windows Startup directory until patches can be made available.
A broader security challenge is emerging across the rapidly evolving artificial intelligence ecosystem, which is being increasingly challenged by convenience-driven deployment models colliding with enterprise-grade security expectations as Ollama vulnerabilities develop in scope.
In response to organizations' increasing adoption of self-hosted large language model infrastructure for the purposes of retaining greater control over sensitive data and inference workloads, researchers warn that insufficient hardening, exposed interfaces, and insecure update mechanisms can result in locally deployed AI environments becoming high-value attack targets.
As a result of memory disclosure flaws, unauthenticated attack paths, and weaknesses within update workflows, AI infrastructure is becoming increasingly attractive to malicious actors looking to gain access to proprietary models, credentials, and operational intelligence, both opportunistic and sophisticated.
Several security experts maintain that artificial intelligence platforms cannot be considered experimental development tools operating outside the traditional security governance framework, but rather need to be integrated into the same rigorous vulnerability management, network segmentation, monitoring, and software lifecycle practices that are used for critical enterprise systems.