NVIDIA has issued a renewed advisory encouraging customers to activate System Level Error-Correcting Code (ECC) protections to defend against Rowhammer attacks targeting GPUs equipped with GDDR6 memory.
This heightened warning follows recent research from the University of Toronto demonstrating how practical Rowhammer attacks can be on NVIDIA’s A6000 graphics processor.
“We ran GPUHammer on an NVIDIA RTX A6000 (48 GB GDDR6) across four DRAM banks and observed 8 distinct single-bit flips, and bit-flips across all tested banks,” the researchers explained. “The minimum activation count (TRH) to induce a flip was ~12K, consistent with prior DDR4 findings.”
Using these induced bit flips, the researchers performed what they described as the first machine learning accuracy degradation attack leveraging Rowhammer on a GPU.
Rowhammer exploits a hardware vulnerability where repeatedly accessing a memory row can cause adjacent memory cells to change state, flipping bits from 1 to 0 or vice versa. This can lead to denial-of-service issues, corrupted data, or even potential privilege escalation.
System Level ECC combats such risks by introducing redundant bits that can automatically detect and correct single-bit memory errors, ensuring data remains intact.
NVIDIA emphasized that enabling ECC is particularly critical for workstation and data center GPUs, which handle sensitive workloads like AI training and inference, to prevent serious computational errors.
The company’s security bulletin confirmed that researchers “showed a potential Rowhammer attack against an NVIDIA A6000 GPU with GDDR6 Memory” in scenarios where ECC had not been turned on.
The GPUHammer technique developed by the academic team successfully induced bit flips despite GDDR6’s higher latency and faster refresh rates, which generally make Rowhammer attacks more challenging compared to older DDR4 memory.
Researcher Gururaj Saileshwar told BleepingComputer that their demonstration could drop an AI model’s accuracy from 80% to below 1% with just a single bit flip on the A6000.
In addition to the RTX A6000, NVIDIA strongly recommends enabling ECC on the following GPU product lines:
Data Center GPUs:
- Ampere: A100, A40, A30, A16, A10, A2, A800
- Ada: L40S, L40, L4
- Hopper: H100, H200, GH200, H20, H800
- Blackwell: GB200, B200, B100
- Turing: T1000, T600, T400, T4
- Volta: Tesla V100, Tesla V100S
Workstation GPUs:
- Ampere RTX: A6000, A5000, A4500, A4000, A2000, A1000, A400
- Ada RTX: 6000, 5000, 4500, 4000, 4000 SFF, 2000
- Blackwell RTX PRO (latest workstation line)
- Turing RTX: 8000, 6000, 5000, 4000
- Volta: Quadro GV100
Embedded/Industrial:
- Jetson AGX Orin Industrial
- IGX Orin
Newer GPUs—including Blackwell RTX 50 Series, Blackwell Data Center chips, and Hopper Data Center GPUs—feature built-in on-die ECC protection that requires no manual configuration.
To verify whether ECC is active, administrators can use an out-of-band method through the Baseboard Management Controller (BMC) and Redfish API to check the “ECCModeEnabled” status. NVIDIA’s NSM Type 3 and SMBPBI tools also allow ECC configuration, but these require NVIDIA Partner Portal access.
Alternatively, ECC can be checked or enabled in-band using the nvidia-smi command-line tool from the system CPU.
Saileshwar noted that enabling these safeguards could reduce machine learning inference performance by around 10% and reduce available memory capacity by 6.5% across workloads.
While Rowhammer remains a significant security concern, its exploitation in real-world scenarios is complex. An attack requires highly specific conditions, intensive memory access, and precise control, making it difficult to carry out reliably, especially in production environments.