July 31, 2018

Improved CPU Microcode Loading

Modern CPUs rely on microcode to control many aspects of their behavior, and microcode updates may be provided by system firmware (BIOS) or the operating system to correct or address issues in CPU operation discovered while in production.

FreeBSD has long supported run-time microcode loading, although in a somewhat cumbersome fashion. Currently a userland tool makes use of a special kernel interface to inject new microcode, which has a couple of downsides. First, microcode is not loaded until well after the kernel has booted, which means that any security or stability improvements provided by a microcode update are not available until late in the boot process. Second, the microcode may revert to that provided by the system after suspend and resume, which is particularly problematic if the new microcode implements new features or control registers (because the registers will “disappear” after resume).

To address this, under sponsorship from the FreeBSD Foundation, I am implementing in-kernel microcode loading. The aim is to apply microcode updates as one of the first stages in the kernel’s boot-up process. In particular, since microcode updates may enable new CPU features such as IBRS, it is desirable to ensure that updates are applied before the kernel enumerates these features. As part of this feature, the kernel will automatically re-apply any existing microcode update on each CPU upon resume, so only minimal portions of the kernel may ever execute without an update applied.

The existing interface requires a userland program, typically cpucontrol(8), to provide a copy of the microcode update file to the kernel. The new update mechanism obviously cannot rely on any userspace functionality, so we instead use the FreeBSD boot loader to copy the microcode update file into kernel memory, along with the kernel itself, before relinquishing the CPU to the kernel. This means that microcode updates are configured by simply adding a line to loader.conf which specifies the path to the update file. When booting a multi-core system, only a single CPU (the “bootstrap processor,” or BSP) is initially enabled; one of the kernel’s boot-time tasks is to initialize and start the remaining CPUs (the “application processors,” or APs). In my patch, one of the first actions of the BSP is to search for a loader-provided microcode update file, search the file for an update which applies to the running system, and load it if one is found. When APs start up, they each check to see if the BSP had successfully found and loaded an update, and apply it themselves if so. Similarly, upon resuming from an ACPI suspend operation, each CPU will re-load the update originally selected by the BSP during boot.

The new code, which does the actual work of identifying an applicable update and loading it, is relatively straightforward. However, because it runs very early on in the kernel’s boot-up procedure, it is subject to constraints which do not apply to most of the rest of the kernel. For example, the Intel SDM specifies that microcode update data must be aligned to a 16 byte boundary, but the loader does not guarantee any particular alignment for raw file data when it places that data in kernel memory. This means that either the loader must be taught to load microcode updates at a suitable alignment, which introduces a subtle dependency between the loader and kernel, or the kernel must make a copy of the update at an aligned address. Unfortunately, the latter option is a bit tricky because microcode updates must be loaded before the kernel memory allocator is initialized! As another example, the microcode loading code cannot access any per-CPU variables since updates are loaded before any per-CPU data structures are created. This complicates the task of preventing multiple hardware threads on the same physical core from simultaneously attempting a microcode update.

One complication of early loading comes from the fact that Intel releases many different microcode update files; each file targets a specific CPU <family, model, stepping> tuple. Each file begins with a fixed-length header which encodes information about the target CPU type, so this header must be parsed and its fields compared with info about the running CPU to determine whether the update can be applied. In the existing update mechanism, cpucontrol(8) does this work and selects from among the microcode update files provided by the sysutils/devcpu-data port. For the new mechanism, we had to choose from among several options:

  1. Configure the loader to load only the microcode update file for the current CPU. This works fine for many deployments, but not in the case where the same disk image is shared among multiple systems with different CPU types. This can arise when a cluster of systems is configured to boot disklessly from a common filesystem, or when a disk image is transferred from one computer to another.
  2. Concatenate all updates into a single file, load the result and select the correct update to apply during boot. This addresses the shortcoming of the first solution, but introduces some complexity in the kernel (which now must iterate over all updates in the file) and wastes memory storing updates which will never be used. The combined set of microcode updates for Intel weighs in at about 2MB and will only grow over time. For many this is a negligible amount of RAM, but it is significant on smaller systems. Not only is physical memory wasted this way, but also the virtual address space needed to map the update data; on 32-bit systems, kernel virtual address (KVA) space is a constrained resource and should not be thrown away lightly.
  3. Teach the loader to select the correct update from among those provided in some directory under /boot. This provides the most flexibility but is also the most complex solution and requires some coupling between the loader and kernel, so it has more failure modes.

The second approach is followed in the current patch set. To avoid wasting RAM and KVA, I did some work to make it possible to release the unused resources back to the system after the kernel has booted up. Previously, this was not possible since the kernel’s physical memory allocator was not explicitly made aware of the regions of memory which store preloaded data such as the kernel binary itself and kernel modules. In particular, if you load a kernel module using a directive in loader.conf and then unload it after the system has booted, we will now reuse the memory that stored the module.

So far, the in-kernel microcode loading patch implements support only for Intel platforms. This is for a couple of reasons. First, AMD most often releases microcode updates to OEMs, which then bundle and release the updates with BIOS updates. This means that the BIOS is responsible for loading microcode updates before any component of FreeBSD begins execution, but also makes it difficult to find AMD systems for testing. Moreover, AMD releases microcode updates quite rarely when compared with Intel. Second, AMD does not have any publically available documentation for their microcode update file format or loading interface. FreeBSD supports AMD microcode updates via the existing microcode update mechanism and will continue to do so. The code added for the new update mechanism will make it straightforward to add AMD support in the future.

The patch to provide support for early loading of Intel microcode updates is currently in review and awaiting feedback. We aim to ensure that the new functionality is available in FreeBSD 12.0, and we plan to backport it to FreeBSD 11-STABLE, ensuring that it will be available in FreeBSD 11.3.

–  Contributed by Mark Johnston