31 Dec 2020

feedKernel Planet

James Bottomley: Deploying Encrypted Images for Confidential Computing

In the previous post I looked at how you build an encrypted image that can maintain its confidentiality inside AMD SEV or Intel TDX. In this post I'll discuss how you actually bring up a confidential VM from an encrypted image while preserving secrecy. However, first a warning: This post represents the state of the art and includes patches that are certainly not deployed in distributions and may not even be upstream, so if you want to follow along at home you'll need to patch things like qemu, grub and OVMF. I should also add that, although I'm trying to make everything generic to confidential environments, this post is based on AMD SEV, which is the only confidential encrypted1 environment currently shipping.

The Basics of a Confidential Computing VM

At its base, current confidential computing environments are about using encrypted memory to run the virtual machine and guarding the encryption key so that the owner of the host system (the cloud service provider) can't get access to it. Both SEV and TDX have the encryption technology inside the main memory controller meaning the L1 cache isn't encrypted (still vulnerable to cache side channels) and DMA to devices must also be done via unencryped memory. This latter also means that both the BIOS and the Operating System of the guest VM must be enlightened to understand which pages to encrypted and which must not. For this reason, all confidential VM systems use OVMF2 to boot because this contains the necessary enlightening. To a guest, the VM encryption looks identical to full memory encryption on a physical system, so as long as you have a kernel which supports Intel or AMD full memory encryption, it should boot.

Each confidential computing system has a security element which sits between the encrypted VM and the host. In SEV this is an aarch64 processor called the Platform Security Processor (PSP) and in TDX it is an SGX enclave running Intel proprietary code. The job of the PSP is to bootstrap the VM, including encrypting the initial OVMF and inserting the encrypted pages. The security element also includes a validation certificate, which incorporates a Diffie-Hellman (DH) key. Once the guest owner obtains and validates the DH key it can use it to construct a one time ECDH encrypted bundle that can be passed to the security element on bring up. This bundle includes an encryption key which can be used to encrypt secrets for the security element and a validation key which can be used to verify measurements from the security element.

The way QEMU boots a Q35 machine is to set up all the configuration (including a disk device attached to the VM Image) load up the OVMF into rom memory and start the system running. OVMF pulls in the QEMU configuration and constructs the necessary ACPI configuration tables before executing grub and the kernel from the attached storage device. In a confidential VM, the first task is to establish a Guest Owner (the person whose encrypted VM it is) which is usually different from the Host Owner (the person running or controlling the Physical System). Ownership is established by transferring an encrypted bundle to the Secure Element before the VM is constructed.

The next step is for the VMM (QEMU in this case) to ask the secure element to provision the OVMF Firmware. Since the initial OVMF is untrusted, the Guest Owner should ask the Secure Element for an attestation of the memory contents before the VM is started. Since all paths lead through the Host Owner, who is also untrusted, the attestation contains a random nonce to prevent replay and is HMAC'd with a Guest Supplied key from the Launch Bundle. Once the Guest Owner is happy with the VM state, it supplies the Wrapped Key to the secure element (along with the nonce to prevent replay) and the Secure Element unwraps the key and provisions it to the VM where the Guest OS can use it for disc encryption. Finally, the enlightened guest reads the encrypted disk to unencrypted memory using DMA but uses the disk encryptor to decrypt it to encrypted memory, so the contents of the Encrypted VM Image are never visible to the Host Owner.

The Gaps in the System

The most obvious gap is that EFI booting systems don't go straight from the OVMF firmware to the OS, they have to go via an EFI bootloader (grub, usually) which must be an efi binary on an unencrypted vFAT partition. The second gap is that grub must be modified to pick the disk encryption key out of wherever the Secure Element has stashed it. The third is that the key is currently stashed in VM memory before OVMF starts, so OVMF must know not to use or corrupt the memory. A fourth problem is that the current recommended way of booting OVMF has a flash drive for persistent variable storage which is under the control of the host owner and which isn't part of the initial measurement.

Plugging The Gaps: OVMF

To deal with the problems in reverse order: the variable issue can be solved simply by not having a persistent variable store, since any mutable configuration information could be used to subvert the boot and leak the secret. This is achieved by stripping all the mutable variable handling out of OVMF. Solving key stashing simply means getting OVMF to set aside a page for a secret area and having QEMU recognise where it is for the secret injection. It turns out AMD were already working on a QEMU configuration table at a known location by the Reset Vector in OVMF, so the secret area is added as one of these entries. Once this is done, QEMU can retrieve the injection location from the OVMF binary so it doesn't have to be specified in the QEMU Machine Protocol (QMP) command. Finally OVMF can protect the secret and package it up as an EFI configuration table for later collection by the bootloader.

The final OVMF change (which is in the same patch set) is to pull grub inside a Firmware Volume and execute it directly. This certainly isn't the only possible solution to the problem (adding secure boot or an encrypted filesystem were other possibilities) but it is the simplest solution that gives a verifiable component that can be invariant across arbitrary encrypted boots (so the same OVMF can be used to execute any encrypted VM securely). This latter is important because traditionally OVMF is supplied by the host owner rather than being part of the VM image supplied by the guest owner. The grub script that runs from the combined volume must still be trusted to either decrypt the root or reboot to avoid leaking the key. Although the host owner still supplies the combined OVMF, the measurement assures the guest owner of its correctness, which is why having a fairly invariant component is a good idea … so the guest owner doesn't have potentially thousands of different measurements for approved firmware.

Plugging the Gaps: QEMU

The modifications to QEMU are fairly simple, it just needs to scan the OVMF file to determine the location for the injected secret and inject it correctly using a QMP command.. Since secret injection is already upstream, this is a simple find and make the location optional patch set.

Plugging the Gaps: Grub

Grub today only allows for the manual input of the cryptodisk password. However, in the cloud we can't do it this way because there's no guarantee of a secure tty channel to the VM. The solution, therefore, is to modify grub so that the cryptodisk can use secrets from a provider, in addition to the manual input. We then add a provider that can read the efi configuration tables and extract the secret table if it exists. The current incarnation of the proposed patch set is here and it allows cryptodisk to extract a secret from an efisecret provider. Note this isn't quite the same as the form expected by the upstream OVMF patch in its grub.cfg because now the provider has to be named on the cryptodisk command line thus

cryptodisk -s efisecret

but in all other aspects, Grub/grub.cfg works. I also discovered several other deviations from the initial grub.cfg (like Fedora uses /boot/grub2 instead of /boot/grub like everyone else) so the current incarnation of grub.cfg is here. I'll update it as it changes.

Putting it All Together

Once you have applied all the above patches and built your version of OVMF with grub inside, you're ready to do a confidential computing encrypted boot. However, you still need to verify the measurement and inject the encrypted secret. As I said before, this isn't easy because, due to replay defeat requirements, the secret bundle must be constructed on the fly for each VM boot. From this point on I'm going to be using only AMD SEV as the example because the Intel hardware doesn't yet exist and AMD kindly gave IBM research a box to play with (Anyone with a new EPYC 7xx1 or 7xx2 based workstation can likely play along at home, but check here). The first thing you need to do is construct a launch bundle. AMD has a tool called sev-tool to do this for you and the first thing you need to do is obtain the platform Diffie Hellman certificate (pdh.cert). The tool will extract this for you

sevtool --pdh_cert_export

Or it can be given to you by the cloud service provider (in this latter case you'll want to verify the provenance using sevtool -validate_cert_chain, which contacts the AMD site to verify all the details). Once you have a trusted pdh.cert, you can use this to generate your own guest owner DH cert (godh.cert) which should be used only one time to give a semblance of ECDHE. godh.cert is used with pdh.cert to derive an encryption key for the launch bundle. You can generate this with

sevtool --generate_launch_blob <policy>

The gory details of policy are in the SEV manual chapter 3, but most guests use 1 which means no debugging. This command will generate the godh.cert, the launch_blob.bin and a tmp_tk.bin file which you must save and keep secure because it contains the Transport Encryption and Integrity Keys (TEK and TIK) which will be used to encrypt the secret. Figuring out the qemu command line options needed to launch and pause a SEV guest is a bit of a palaver, so here is mine. You'll likely need to change things, like the QMP port and the location of your OVMF build and the launch secret.

Finally you need to get the launch measure from QMP, verify it against the sha256sum of OVMF.fd and create the secret bundle with the correct GUID headers. Since this is really fiddly to do with sevtool, I wrote this python script3 to do it all (note it requires qmp.py from the qemu git repository). You execute it as

sevsecret.py --passwd <disk passwd> --tiktek-file <location of tmp_tk.bin> --ovmf-hash <hash> --socket <qmp socket>

And it will verify the launch measure and encrypt the secret for the VM if the measure is correct and start the VM. If you got everything correct the VM will simply boot up without asking for a password (if you inject the wrong secret, it will still ask). And there you have it: you've booted up a confidential VM from an encrypted image file. If you're like me, you'll also want to fire up gdb on the qemu process just to show that the entire memory of the VM is encrypted …

Conclusions and Caveats

The above script should allow you to boot an encrypted VM anywhere: locally or in the cloud, provided you can access the QMP port (most clouds use libvirt which introduces yet another additional layering pain). The biggest drawback, if you refer to the diagram, is the yellow box: you must trust the secret element, which in both Intel and AMD is proprietary4, in order to get confidential computing to work. Although there is hope that in future the secret element could be fully open source, it isn't today.

The next annoyance is that launching a confidential VM is high touch requiring collaboration from both the guest owner and the host owner (due to the anti-replay nonce). For a single launch, this is a minor annoyance but for an autoscaling (launch VMs as needed) platform it becomes a major headache. The solution seems to be to have some Hardware Security Module (HSM), like the cloud uses today to store encryption keys securely, and have it understand how to measure and launch encrypted VMs on behalf of the guest owner.

The final conclusion to remember is that confidentiality is not security: your VM is as exploitable inside a confidential encrypted VM as it was outside. In many ways confidentiality and security are opposites, in that security in part requires reducing the trusted code and confidentiality requires pulling as much as possible inside. Confidential VMs do have an answer to the Cloud trust problem since the enterprise can now deploy VMs without fear of tampering by the cloud provider, but those VMs are as insecure in the cloud as they were in the Enterprise Data Centre. All of this argues that Confidential Computing, while an important milestone, is only one step on the journey to cloud security.

Patch Status

The OVMF patches are upstream (including modifications requested by Intel for TDX). The QEMU and grub patch sets are still on the lists.

31 Dec 2020 10:40pm GMT

30 Dec 2020

feedKernel Planet

Paul E. Mc Kenney: Parallel Programming: December 2020 Update

This release of Is Parallel Programming Hard, And, If So, What Can You Do About It? features numerous improvments:

  1. LaTeX and build-system upgrades (including helpful error checking and reporting), formatting improvements (including much nicer display of hyperlinks and of Quick Quizzes, polishing of numerous figures and tables, plus easier builds for A4 paper), refreshing of numerous broken URLs, an improved "make help" command (see below), improved FAQ-BUILD material, and a prototype index, all courtesy of Akira Yokosawa.
  2. A lengthy Quick Quiz on the relationship of half-barriers, compilers, CPUs, and locking primitives, courtesy of Patrick Yingxi Pan.
  3. Updated performance results throughout the book, courtesy of a large x86 system kindly provided by Facebook.
  4. Compiler tricks, RCU semantics, and other material from the Linux-kernel memory model added to the memory-ordering and tools-of-the-trade chapters.
  5. Improved discussion of non-blocking-synchronization algorithms.
  6. Many new citations, cross-references, fixes, and touchups throughout the book.

A number of issues were spotted by Motohiro Kanda in the course of his translation of this book to Japanese, and Borislav Petkov, Igor Dzreyev, and Junchang Wang also provided much-appreciated fixes.

The output of the aforementioned make help is as follows:

Official targets (Latin Modern Typewriter for monospace font):
  Full,              Abbr.
  perfbook.pdf,      2c:   (default) 2-column layout
  perfbook-1c.pdf,   1c:   1-column layout

Set env variable PERFBOOK_PAPER to change paper size:
   PERFBOOK_PAPER=A4: a4paper
   PERFBOOK_PAPER=HB: hard cover book
   other (default):   letterpaper

make help-full" will show the full list of available targets.

The following excerpt of the make help-full command's output might be of interest to those who find Quick Quizzes distracting:

Experimental targets:
  Full,              Abbr.
  perfbook-qq.pdf,   qq:   framed Quick Quizzes
  perfbook-nq.pdf,   nq:   no inline Quick Quizzes (chapterwise Answers)

Thus, the make nq command creates a perfbook-nq.pdf with Quick Quizzes and their answers grouped at the end of each chapter, in the usual textbook style, while still providing PDF navigation from each Quick Quiz to the relevant portion of that chapter.

Finally, this release also happens to be the first release candidate for the long-awaited Second Edition, which should be available shortly.

30 Dec 2020 5:33am GMT

23 Dec 2020

feedKernel Planet

James Bottomley: Building Encrypted Images for Confidential Computing

With both Intel and AMD announcing confidential computing features to run encrypted virtual machines, IBM research has been looking into a new format for encrypted VM images. The first question is why a new format, after all qcow2 only recently deprecated its old encrypted image format in favour of luks. The problem is that in confidential computing, the guest VM runs inside the secure envelope but the host hypervisor (including the QEMU process) is untrusted and thus runs outside the secure envelope and, unfortunately, even for the new luks format, the encryption of the image is handled by QEMU and so the encryption key would be outside the secure envelope. Thus, a new format is needed to keep the encryption key (and, indeed, the encryption mechanism) within the guest VM itself. Fortunately, encrypted boot of Linux systems has been around for a while, and this can be used as a practical template for constructing a fully confidential encrypted image format and maintaining that confidentiality within a hostile cloud environment. In this article, I'll explore the state of the art in encrypted boot, constructing EFI encrypted boot images, and finally, in the follow on article, look at deploying an encrypted image into a confidential environment and maintaining key secrecy in the cloud.

Encrypted Boot State of the Art

Luks and the cryptsetup toolkit have been around for a while and recently (in 2018), the luks format was updated to version 2. However, actually booting a linux kernel from an encrypted partition has always been a bit of a systems problem, primarily because the bootloader (grub) must decrypt the partition to actually load the kernel. Fortunately, grub can do this, but unfortunately the current grub in most distributions (2.04) can only read the version 1 luks format. Secondly, the user must type the decryption passphrase into grub (so it can pull the kernel and initial ramdisk out of the encrypted partition to boot them), but grub currently has no mechanism to pass it on to the initial ramdisk for mounting root, meaning that either the user has to type their passphrase twice (annoying) or the initial ramdisk itself has to contain a file with the disk passphrase. This latter is the most commonly used approach and only has minor security implications when the system is in motion (the ramdisk and the key file must be root read only) and the password is protected at rest by the fact that the initial ramdisk is also on the encrypted volume. Even more annoying is the fact that there is no distribution standard way of creating the initial ramdisk. Debian (and Ubuntu) have the most comprehensive documentation on how to do this, so the next section will look at the much less well documented systemd/dracut mechanism.

Encrypted Boot for Systemd/Dracut

Part of the problem here seems to be less that stellar systems co-ordination between the two components. Additionally, the way systemd supports passphraseless encrypted volumes has been evolving for a while but changed again in v246 to mirror the Debian method. Since cloud images are usually pretty up to date, I'll describe this new way. Each encrypted volume is referred to by UUID (which will be the UUID of the containing partition returned by blkid). To get dracut to boot from an encrypted partition, you must pass in


but you must also have a key file named


And, since dracut hasn't yet caught up with this, you usually need a cryptodisk.conf file in /etc/dracut.conf.d/ which contains

install_items+=" /etc/cryptsetup-keys.d/* "

Grub and EFI Booting Encrypted Images

Traditionally grub is actually installed into the disk master boot record, but for EFI boot that changed and the disk (or VM image) must have an EFI System partition which is where the grub.efi binary is installed. Part of the job of the grub.efi binary is to find the root partition and source the /boot/grub1/grub.cfg. When you install grub on an EFI partition a search for the root by UUID is actually embedded into the grub binary. Another problem is likely that your distribution customizes the location of grub and updates the boot variables to tell the system where it is. However, a cloud image can't rely on the boot variables and must be installed in the default location (\EFI\BOOT\bootx64.efi). This default location can be achieved by adding the -removable flag to grub-install.

For encrypted boot, this becomes harder because the grub in the EFI partition must set up the cryptographic location by UUID. However, if you add


To /etc/default/grub it will do the necessary in grub-install and grub-mkconfig. Note that on Fedora, where every other GRUB_ENABLE parameter is true/false, this must be 'y', unfortunately grub-install will look for =y not =true.

Putting it all together: Encrypted VM Images

Start by extracting the root of an existing VM image to a tar file. Make sure it has all the tools you will need, like cryptodisk and grub-efi. Create a two partition raw image file and loopback mount it (I usually like 4GB) with a small efi partition (p1) and an encrypted root (p2):

truncate -s 4GB disk.img
parted disk.img mklabel gpt
parted disk.img mkpart primary 1Mib 100Mib
parted disk.img mkpart primary 100Mib 100%
parted disk.img set 1 esp on
parted disk.img set 1 boot on

Now setup the efi and cryptosystem (I use ext4, but it's not required). Note at this time luks will require a password. Use a simple one and change it later. Also note that most encrypted boot documents advise filling the encrypted partition with random numbers. I don't do this because the additional security afforded is small compared with the advantage of converting the raw image to a smaller qcow2 one.

losetup -P -f disk.img          # assuming here it uses loop0
l=($(losetup -l|grep disk.img)) # verify with losetup -l
mkfs.vfat ${l}p1
blkid ${l}p1       # remember the EFI partition UUID
cryptsetup --type luks1 luksFormat ${l}p2 # choose temp password
blkid ${l}p2       # remember this as <UUID> you'll need it later 
cryptsetup luksOpen ${l}p2 cr_root
mkfs.ext4 /dev/mapper/cr_root
mount /dev/mapper/cr_root /mnt
tar -C /mnt -xpf <vm root tar file>
for m in run sys proc dev; do mount --bind /$m /mnt/$m; done
chroot /mnt

Create or modify /etc/fstab to have root as /dev/disk/cr_root and the EFI partition by label under /boot/efi. Now set up grub for encrypted boot2

echo "GRUB_ENABLE_CRYPTODISK=y" >> /etc/default/grub
mount /boot/efi
grub-install --removable --target=x86_64-efi
grub-mkconfig -o /boot/grub/grub.cfg

For Debian, you'll need to add an /etc/crypttab entry for the encrypted disk:

cr_root UUID=<uuid> luks none

And then re-create the initial ramdisk. For dracut systems, you'll have to modify /etc/default/grub so the GRUB_CMDLINE_LINUX has a rd.luks.uuid=<UUID> entry. If this is a selinux based distribution, you may also have to trigger a relabel.

Now would also be a good time to make sure you have a root password you know or to install /root/.ssh/authorized_keys. You should unmount all the binds and /mnt and try EFI booting the image. You'll still have to type the password a couple of times, but once the image boots you're operating inside the encrypted envelope. All that remains is to create a fast boot high entropy low iteration password and replace the existing one with it and set the initial ramdisk to use it. This example assumes your image is mounted as SCSI disk sda, but it may be a virtual disk or some other device.

dd if=/dev/urandom bs=1 count=33|base64 -w 0 > /etc/cryptsetup-keys.d/luks-<UUID>.key
chmod 600 /etc/cryptsetup-keys.d/luks-<UUID>.key
cryptsetup --key-slot 1 luksAddKey /dev/sda2 # permanent recovery key
cryptsetup --key-slot 0 luksRemoveKey /dev/sda2 # remove temporary
cryptsetup --key-slot 0 --iter-time 1 luksAddKey /dev/sda2 /etc/cryptsetup-keys.d/luks-<UUID>.key

Note the "-w 0" is necessary to prevent the password from having a trailing newline which will make it difficult to use. For mkinitramfs systems, you'll now need to modify the /etc/crypttab entry

cr_root UUID=<UUID> /etc/cryptsetup-keys.d/luks-<UUID>.key luks

For dracut you need the key install hook in /etc/dracut.conf.d as described above and for Debian you need the keyfile pattern:

echo "KEYFILE_PATTERN=\"/etc/cryptsetup-keys.d/*\"" >>/etc/cryptsetup-initramfs/conf-hook

You now rebuild the initial ramdisk and you should now be able to boot the cryptosystem using either the high entropy password or your rescue one and it should only prompt in grub and shouldn't prompt again. This image file is now ready to be used for confidential computing.

23 Dec 2020 6:10pm GMT