Another day, another speculative execution-based attack. Data protected by Intel’s SGX—data that’s meant to be protected even from a malicious or hacked kernel—can be read by an attacker thanks to leaks enabled by speculative execution.
Since publication of the Spectre and Meltdown attacks in January this year, security researchers have been taking a close look at speculative execution and the implications it has for security. All high-speed processors today perform speculative execution: they assume certain things (a register will contain a particular value, a branch will go a particular way) and perform calculations on the basis of those assumptions. It’s an important design feature of these chips that’s essential to their performance, and it has been for 20 years.
What’s in store today? A new Meltdown-inspired attack on Intel’s SGX, given the name Foreshadow by the researchers who found it. Two groups of researchers found the vulnerability independently: a team from KU Leuven in Belgium reported it to Intel in early January—just before Meltdown and Spectre went public—and a second team from the University of Michigan, University of Adelaide, and Technion reported it three weeks later.
SGX, standing for Software Guard eXtensions, is a new feature that Intel introduced with its Skylake processors that enables the creation of Trusted Execution Environments (TEEs). TEEs are secure environments where both the code and the data the code works with are protected to ensure their confidentiality (nothing else on the system can spy on them) and integrity (any tampering with the code or data can be detected). SGX is used to create what are called enclaves: secure blocks of memory containing code and data. The contents of an enclave are transparently encrypted every time they’re written to RAM and decrypted on being read. The processor governs access to the enclave memory: any attempt to access the enclave’s memory from outside the enclave should be blocked.
The value that SGX offers is that it allows these secure environments to be created without having to trust the integrity of the operating system, hypervisor, or any other layers of the system. The processor itself validates and protects the enclave, so as long as the processor is trusted, the enclave can be trusted. This is attractive in, for example, cloud-hosting scenarios: while most people trust that the cloud host isn’t malicious and isn’t spying on sensitive data used on its systems, SGX removes the need to assume. Even if the hypervisor and operating system are compromised, the integrity and confidentiality of the enclave should be unaffected.
And that’s where Foreshadow comes into play.
Foreshadow was, er, foreshadowed
All of these speculative execution attacks follow a common set of principles. Each processor has an architectural behavior (the documented behavior that describes how the instructions work and that programmers depend on to write their programs) and a microarchitectural behavior (the way an actual implementation of the architecture behaves). These can diverge in subtle ways. For example, architecturally, a program that performs a conditional branch (that is: comparing the contents of two registers and using that comparison to determine which piece of code to execute next) will wait until the condition is known before making the branch. Microarchitecturally, however, the processor might try to speculatively guess at the result of the comparison so that it can perform the branch and continue executing instructions without having to wait.
If the processor guesses wrong, it will roll back the extra work it did and take the correct branch. The architecturally defined behavior is thus preserved. But that faulty guess will disturb other parts of the processor—in particular, the contents of the cache. The guessed-at branch can cause data to be loaded into the cache, for example (or, conversely, it can push other data out of the cache). These microarchitectural disturbances can be detected and measured—loading data from memory is quicker if it’s already in the cache. This allows a malicious program to make inferences about the values stored in memory.
The closest precursor to the new Foreshadow attack is Meltdown. With Meltdown, an attacker would try to read kernel memory from a user program. The processor prohibits this—the permissions for kernel memory don’t allow it to be read from user programs—but the prohibition isn’t instant. Execution continues speculatively for a few instructions past the illegal read, and the contents of cache can be modified by that execution. When the processor notices that the read was illegal, it generates an exception and rolls back the speculated execution. But the modifications to cache can be detected, and this can be used to infer the contents of kernel memory.
For Foreshadow, the data of interest is the encrypted data in the enclave. The overall pattern is the same—attempt to read enclave memory from outside the enclave, allow speculative execution to modify the cache based on that data that was read, and then have the processor abort the speculation when it realizes that it’s protected-enclave memory and that reading it isn’t allowed. The attack depends on the fact that only data in main memory is encrypted: once it’s inside the processor in a cache, it’s decrypted. Specifically, if the data is in level 1 cache, the speculative execution can use it before the processor determines that there’s no permission to use it.
More complicated than Meltdown
The details of the Foreshadow attack are a little more complicated than those of Meltdown. In Meltdown, the attempt to perform an illegal read of kernel memory triggers the page fault mechanism (by which the processor and operating system cooperate to determine which bit of physical memory a memory access corresponds to, or they crash the program if there’s no such mapping). Attempts to read SGX data from outside an enclave receive special handling by the processor: reads always return a specific value (-1), and writes are ignored completely. The special handling is called “abort page semantics” and should be enough to prevent speculative reads from being able to learn anything.
However, the Foreshadow researchers found a way to bypass the abort page semantics. The data structures used to control the mapping of virtual-memory addresses to physical addresses include a flag to say whether a piece of memory is present (loaded into RAM somewhere) or not. If memory is marked as not being present at all, the processor stops performing any further permissions checks and immediately triggers the page fault mechanism: this means that the abort page mechanics aren’t used. It turns out that applications can mark memory, including enclave memory, as not being present by removing all permissions (read, write, execute) from that memory.
Additional techniques were also devised to reduce the chance of data in level 1 cache being overwritten during the attack and increase the amount of information that can be read. With a malicious kernel driver, the full contents of the enclave can be read. Normally “with a kernel driver” isn’t an interesting attack vector—kernel code is meant to be able to do more or less anything anyway—but SGX is explicitly meant to protect secrets even in the face of a hostile, compromised kernel.
As such, data that should be secret and encrypted and visible only to trusted SGX code can be read by an attacker. Moreover, by using Foreshadow to read data from special Intel-provided enclaves, an attacker can fraudulently create their own enclaves with compromised integrity. There are also additional risks if multiple enclaves are running simultaneously in different hyperthreads on the same physical core; one enclave can attack the other.
The researchers stress that their work doesn’t undermine the basic design of SGX; Foreshadow is a quirk of the way speculative execution interacts with SGX, and, with that quirk resolved, the security of the system is restored (though historic encrypted data could potentially have been tampered with).
When the attack was reported to Intel, the company performed its own investigation. It discovered that SGX data isn’t the only thing that’s at risk. The processor also has other specially protected zones of memory: the Extended Page Tables used by hypervisors, and memory used by System Management Mode (SMM), which can be used for power management or other low-level functions. As with the SGX data, the EPT and SMM data that’s held in level 1 cache can be speculatively read and, hence, leaked to an attacker if memory is marked as being not present.
Normally, access to EPT memory undergoes additional translation into a physical address, and access to SMM memory has a special permissions check to ensure the processor is in management mode. But when memory is marked as not present, the permissions-checking terminates early, bypassing this special handling.
Intel has thus dubbed the flaw the “Level 1 Terminal Fault” (L1TF): data in level 1 cache can be leaked because the permissions check terminates too soon.
The good news? Big parts are fixed already
As with many of the other speculative execution issues, a large part of the fix comes in the form of microcode updates, and in this case, the microcode updates are already released and in the wild and have been for some weeks. With the updated microcode, every time the processor leaves execution of an enclave, it also flushes the level 1 cache. With no data in level 1 cache, there’s no scope for the L1TF to take effect. Similarly, with the new microcode leaving, management mode flushes the level 1 cache, protecting SMM data.
The microcode also gives operating systems the ability to completely flush the level 1 data cache (without altering any other cache). Hypervisors can insert these flushes at certain points to protect the EPT data. Operating systems should also be updated to ensure that their mapping from virtual addresses to physical addresses follows certain rules so that secret data can never find itself in level 1 cache inadvertently.
These cases don’t, however, completely eliminate the risks, especially when hyperthreading is used. With hyperthreading, one logical core can be within SGX, hypervisor, or SMM code, while the other logical core is not. The other logical core can thus snoop on level 1 cache, and the extra cache flushes can’t prevent this (though they can certainly make it less convenient, due to the increased chance of a flush occurring during an attack).
This concern is particularly acute with virtual machines: if two virtual machines share a physical core, then the virtual machine using one logical core can potentially spy on the virtual machine using the other logical core. One option here is to disable hyperthreading on virtual-machine hosts. The other alternative is to ensure that virtual machines are bound to physical cores such that they don’t share.
For SGX data, however, the L1TF risk with hyperthreading enabled can’t be completely eliminated.
Longer term, Intel promises to fix the issue in hardware. Cascade Lake processors, due to ship later this year, will not suffer the L1TF (or Meltdown) issues at all, suggesting that the new processors will change how they handle the permission checks to prevent speculative execution from running ahead of permissions checks.
Listing image by Conor Lawless / Flickr