Potentially disastrous Rowhammer bitflips can bypass ECC protections

A DDR3 DIMM with error-correcting code from Samsung. ECC is no longer an absolute defense against Rowhammer attacks.

Enlarge / A DDR3 DIMM with error-correcting code from Samsung. ECC is no longer an absolute defense against Rowhammer attacks.

In early 2015, researchers unveiled Rowhammer, a cutting-edge hack that exploits unfixable physical weaknesses in the silicon of certain types of memory chips to transform data they stored. In the 42 months that have passed since then, an enhancement known as error-correcting code (or ECC) available in higher-end chips was believed to be an absolute defense against potentially disastrous bitflips that changed 0s to 1s and vice versa.

Research published Wednesday has now shattered that assumption.

Dubbed ECCploit, the new Rowhammer attack bypasses ECC protections built into several widely used models of DDR3 chips. The exploit is the product of more than a year of painstaking research that used syringe needles to inject faults into chips and supercooled chips to observe how they responded when bits flipped. The resulting insights, along with some advanced math, allowed researchers in Vrije Universiteit Amsterdam’s VUSec group to demonstrate that one of the key defenses against Rowhammer isn’t sufficient.

A major milestone

Importantly, the researchers haven’t demonstrated that ECCploit works against ECC in DDR4 chips, a newer type of memory chip favored by higher-end cloud services. They also haven’t shown that ECCploit can penetrate hypervisors or secondary Rowhammer defenses. Nonetheless, the bypass of ECC is a major milestone that suggests that the threat of Rowhammer continues to evolve and can’t easily be discounted.

“It has so far been assumed that ECC provides a strong protection against Rowhammer attacks,” Kaveh Razavi, one of the VUSec researchers who developed the exploit, told Ars. “ECCploit shows for the first time that it is possible to mount practical Rowhammer attacks on vulnerable ECC DRAM.”

In the research paper, the researchers wrote:

Rowhammer has evolved into a serious threat to computer systems, from the smallest mobile devices to very large clouds, but so far machinery with high-end memory with error correcting code (ECC) has been free from such attacks. This has been due to the complex challenge of reverse-engineering commodity ECC functions and, more importantly, to the narrow margins within which attackers must operate: multiple bits must flip in order to bypass the error-correcting functionality, but flipping the wrong number of bits may crash the system. Thus, many believed that Rowhammer on ECC memory, even if plausible in theory, is simply impractical. This paper shows this to be false: while harder, Rowhammer attacks are still a realistic threat even to modern ECC-equipped systems. This is particularly worrying, because all other existing defenses have already been proven insecure. Given the proliferation of Rowhammer vulnerabilities across a broad range of systems, we urgently need better defenses against these attacks.

To review, DDR memory is laid out in an array of rows and columns that are assigned in large blocks to various applications and operating-system resources. To protect the integrity and security of the entire system, each allocated chunk of memory is contained in a “sandbox” that can be accessed only by a given app or OS process.

As the physical dimensions of chips have shrunk over time, there is less space between each DRAM cell. The tight quarters threaten this security model because they make it increasingly hard to prevent a cell assigned to one app or process from interacting electrically with neighboring cells assigned to a different app or process.

Rowhammer exploits this physical weakness by rapidly accessing—or “hammering”—one or more carefully selected rows inside a vulnerable DIMM. By reading one or more “aggressor” rows of memory thousands of times a second, the exploit can reverse one or more bits in a “victim” location. When done with precision, Rowhammer can flip bits in ways that have major consequences for security, for instance, by allowing an untrusted app to gain full administrative rights, breaking out of security sandboxes or virtual-machine hypervisors, or rooting devices running the vulnerable DIMM.

ECC: Some restrictions apply

ECC works by using what are known as memory words to store redundant control bits next to the data bits inside the DIMMs. CPUs use these words to quickly detect and repair flipped bits. ECC was originally designed to protect against a naturally occurring phenomenon in which cosmic rays flip bits in newer DIMMs. After Rowhammer appeared, ECC’s importance grew when it was demonstrated to be the most effective defense.

But some limitations apply. ECC generally adds enough redundancy to repair single bitflips in a 64-bit word. When two bitflips occur in a word, it will cause the underlying program or process to crash. When three bitflips occur in the right places, ECC can be completely bypassed.

Until now, there has been little public knowledge about how ECC worked. The VUSec researchers spent months reverse-engineering the process, in part by using syringe needles to inject faults into chips and subjecting chips to a cold-boot attack. By extracting data stored inside the supercooled chips as they experienced the errors, the researchers were able to learn how computer memory controllers processed ECC control bits.

Here’s a video of the researchers using the cold-boot technique:

[embedded content]
Cold-Boot attack for reverse-engineering the error-correcting code (ECC).

And here’s a video of syringe needles injecting faults:

[embedded content]
Memory bus fault injection with two syringe needles.

The researchers eventually discovered a timing side channel. By carefully measuring the amount of time it took to carry out certain processes, the researchers were able to infer granular details about bitflips occurring inside the silicon. In a blog post, the researchers wrote:

Armed with this knowledge, we then proceeded to show that ECC merely slows down the Rowhammer attack and is not enough to stop it. Intuitively, the approach is fairly straightforward. Recall that we need three bitflips, while avoiding a situation in which only two bitflips occur. The first thing we discovered was a technique to ensure that, at most, one particular bitflip occurs in a memory word. The trick is simple: we make sure that all bits in the location that we hammer and the bits in the location that we want to attack are the same, except one. If the bits at the same position in the two locations are the same, no bitflip will occur. If they are different, the bit may flip. So we can independently try and flip first bit 1, then bit 2, then bit 3, etc. At first sight, that seems pointless. After all, ECC will simply correct that bitflip and it would seem as if nothing happened.

A timely trick

Phrased differently: one flip is no flip. However, this is not entirely true. What we found is that we can detect that a bit has been corrected by means of a timing side channel. Simply put: it will typically take measurably longer to read from a memory location where a bitflip needs to be corrected than it takes to read from an address where no correction was needed. Thus, we can try each bit in turn until we find a word in which we could flip three bits that are vulnerable. The final step is then to make all three bits in the two locations different and hammer one final time, to flip all three bits in one go: mission accomplished.

No imminent threat

The researchers tested ECCploit on four hardware platforms, including:

  • AMD Opteron 6376 Bulldozer (15h)
  • Intel Xeon E3-1270 v3 Haswell
  • Intel Xeon E5-2650 v1 Sandy Bridge
  • Intel Xeon E5-2620 v1 Sandy Bridge

The researchers said they tested “several memory modules from different manufacturers” and confirmed that a significant amount of Rowhammer bitflips occurred in a type of DIMM tested by a different team of researchers. The VUSec researchers declined to identify the DIMM manufacturers.

As noted earlier, ECCploit focuses on DDR3 DIMMs (although in fairness, the researchers said they believe some telltale side channel exists in DDR4). There’s also no indication that ECCploit works reliably against end points typically used in cloud environments such as AWS or Microsoft Azure.

In a statement, a Microsoft official wrote: “We continually monitor and test the security of our services against Rowhammer attacks, including worst-case attack conditions that go beyond realistic scenarios. This testing includes the techniques described in this paper, which do not pose a threat to our services.” The statement didn’t elaborate. Amazon officials didn’t respond to an email seeking comment for this post.

The takeaway: while ECCploit represents a significant advance that may (a) leave some servers vulnerable or (b) open systems to future attacks, there’s no indication ECCploit currently poses an imminent threat to the large cloud providers.

“Overall, this is impressive work that will help hardware manufacturers improve their defenses for this class of attacks, but we don’t (yet) have direct evidence of any widespread vulnerability on the major public cloud providers,” Kenn White, an independent researcher who specializes in cloud security, told Ars. “I don’t want to come across as a grumpy guy in the balcony, because this is grueling work that took hundreds of hours to pull off. But unless you can demonstrate a real exploit, it remains in the confines of endpoints and on-premise hardware.”

Similar Posts: