Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream Image Encryption Standardization #560

Open
1 of 5 tasks
josephineSei opened this issue Apr 11, 2024 · 72 comments
Open
1 of 5 tasks

Upstream Image Encryption Standardization #560

josephineSei opened this issue Apr 11, 2024 · 72 comments
Assignees
Labels
question Further information is requested SCS-VP10 Related to tender lot SCS-VP10 standards Issues / ADR / pull requests relevant for standardization & certification upstream Implemented directly in the upstream

Comments

@josephineSei
Copy link
Contributor

josephineSei commented Apr 11, 2024

Currently there is the possibility in Cinder to encrypt volumes and in Nova to use qcow2 encrypted images (still under development).
Both can lead to and use LUKS-encrpyted images, but those are different and not aligned:

  • qcow2 with LUKS in it for Nova
  • LUKS raw blocks for Cinder

@markus-hentsch also found out in #541 that uploading a LUKS-encrypted image (that was created from a volume) to another cloud in combination with setting a few parameteres (cinder_encryption_key, etc...) will result in an image that can be used to create an encrpyted and functional volume.

As a user it would be good to have a streamlined operation to use encrypted images in openstack for both volumes and ephemeral storage and to also allow interoperability between clouds.
Therefore we need and will propose standardized parameters to describe and detect an encrypted image, which might be similar to the parameters described here: https://specs.openstack.org/openstack/cinder-specs/specs/zed/image-encryption.html
But will use the LUKS encryption.
So those encrypted images could be natively mounted in Nova or just formed into a volume (raw LUKS images can be directly used, qcow images need to be flattened).

With such a way encrypted backup images can be easily downloaded and transferred to another cloud.

  • A spec has to be written for Glance (and maybe Cinder?)
  • Implementation in Glance to standardize image parameters
  • Implementation in OSC/SDK to encrypt while uploading and decrypt while downloading an image
  • Implementation in Cinder to align to new standard parameters
  • Implementation in Cinder to flatten encrypted qcow to raw volumes

This is a result from a lengthy discussion at the PTG with Nova, Cinder and Glance ( https://etherpad.opendev.org/p/dalmatian-ptg-cinder#L376 )

Followup tasks may be to implement re-encryption to fully change keys for LUKS volumes and images.

@josephineSei josephineSei added the question Further information is requested label Apr 11, 2024
@josephineSei josephineSei added SCS-VP10 Related to tender lot SCS-VP10 standards Issues / ADR / pull requests relevant for standardization & certification labels Apr 11, 2024
@josephineSei
Copy link
Contributor Author

I have discussed and outlined the spec with @markus-hentsch and currently writing it.

@markus-hentsch
Copy link
Contributor

markus-hentsch commented Apr 11, 2024

On the user side we'd need to be able to create LUKS-encrypted files. Nova devs made us aware of the QEMU tooling being able to do so allegedly.

Here is the Nova implementation:

Here's a shortened example:

echo "muchsecretsuchwow" > secret_file.key

qemu-img convert -f raw -O luks --object secret,id=sec,file=secret_file.key -o key-secret=sec \
-o cipher-alg=aes-256 -o cipher-mode=xts -o hash-alg=sha256 \
 -o ivgen-alg=plain64 -o ivgen-hash-alg=sha256 \
$INPUT_FILE $OUTPUT_LUKS_FILE

It works within a Docker container running Ubuntu LTS but interestingly it doesn't on my NixOS setup, failing with:

qemu-img: output.luks: error while converting luks: Unsupported cipher mode xts

Versions of qemu-img are 6.2.0 (Ubuntu) and 8.1.5 (NixOS) respectively. I wonder if there is another system dependency in play here or if it's the version difference. We should investigate because this impacts the ability to create images on the client side.

EDIT:

Tried it on ubuntu:noble image which ships qemu-img 8.2.1 and it still works. So no deprecation problem here in regards to the qemu-img version. I think we can safely ignore it as a NixOS-specific issue then.

@markus-hentsch
Copy link
Contributor

As I was adding restore instructions to the user data backup guide in SovereignCloudStack/docs#176 I reproduced the process of creating volumes from previously encrypted (LUKS) images originating from Cinder itself on my DevStack and noticed that Cinder does not verify that the target volume actually has an encrypted type:

$ file image.raw
image.raw: LUKS encrypted file, ver 1 [aes, xts-plain64, sha256]

$ file image.key
image.key: data

$ openstack secret store --algorithm aes --bit-length 256 --mode cbc \
  --secret-type symmetric --file image.key --name restored-image-key
  
$ openstack secret list -f value -c "Secret href" -c "Name"
http://10.0.1.116/key-manager/v1/secrets/6ea6b0a8-de50-45b8-90b7-9470c4dd201a restored-image-key

$ export SECRET_ID=6ea6b0a8-de50-45b8-90b7-9470c4dd201a

$ openstack image create --file image.raw \
  --property cinder_encryption_key_id=$SECRET_ID \
  --property cinder_encryption_key_deletion_policy=on_image_deletion \
  restored-image

$ openstack volume create --size 1 \
  --image restored-image volume-restored-notype

$ openstack volume show -f value -c type volume-restored-notype
lvmdriver-1  # <- this is an *unencrypted* volume type! (which contains LUKS blocks now)

$ openstack volume create --size 1 \
  --image restored-image --type lvmdriver-1-LUKS \
  volume-restored-lukstype

$ openstack volume show -f value -c type volume-restored-lukstype
lvmdriver-1-LUKS

$ openstack server create \
  --volume volume-restored-notype \
  ... server-from-untyped-volume
  
$ openstack server create \
  --volume volume-restored-lukstype \
  ... server-from-luks-volume

This results in the server server-from-untyped-volume being stuck at "No bootable device found" on the virtual console. Only server-from-luks-volume where a LUKS volume type was explicitly chosen while restoring the LUKS image was bootable.

It seems that this is an oversight in Cinder in its current implementation?

@josephineSei
Copy link
Contributor Author

I tested this with a simple encrypted volume to encrypted image to volume:

stack@devstack:~/devstack$ openstack image show encrypted-volume-image
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field            | Value                                                                                                                                                                   |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| checksum         | 4f0538390cb7728f6e02d6a58a824e11                                                                                                                                        |
| container_format | bare                                                                                                                                                                    |
| created_at       | 2024-04-11T13:09:08Z                                                                                                                                                    |
| disk_format      | raw                                                                                                                                                                     |
| file             | /v2/images/49febf2e-4f20-47f0-8230-660c2dcb19dc/file                                                                                                                    |
| id               | 49febf2e-4f20-47f0-8230-660c2dcb19dc                                                                                                                                    |
| min_disk         | 0                                                                                                                                                                       |
| min_ram          | 0                                                                                                                                                                       |
| name             | encrypted-volume-image                                                                                                                                                  |
| owner            | 15f2ab0eaa5b4372b759bde609e86224                                                                                                                                        |
| properties       | cinder_encryption_key_deletion_policy='on_image_deletion', cinder_encryption_key_id='286339bc-5484-4a27-b24a-816f89a2968c', hw_rng_model='virtio', locations='[{'url':  |
|                  | 'rbd://adbc1d67-adf7-4a13-94b5-2a2570d41ed9/images/49febf2e-4f20-47f0-8230-660c2dcb19dc/snap', 'metadata': {}}]', os_hash_algo='sha512',                                |
|                  | os_hash_value='7160b7451962496108194179757221dbbb5af7654b205a63cf6284a70dd1caed8ee289b98a56f0cb0d37e363adc18cc334d0f1bf6be937e1e9af9036009b0856', os_hidden='False',    |
|                  | owner_specified.openstack.md5='', owner_specified.openstack.object='images/cirros-0.6.2-x86_64-disk', owner_specified.openstack.sha256='', signature_verified='False'   |
| protected        | False                                                                                                                                                                   |
| schema           | /v2/schemas/image                                                                                                                                                       |
| size             | 3221225472                                                                                                                                                              |
| status           | active                                                                                                                                                                  |
| tags             |                                                                                                                                                                         |
| updated_at       | 2024-04-11T13:11:31Z                                                                                                                                                    |
| virtual_size     | 3221225472                                                                                                                                                              |
| visibility       | shared                                                                                                                                                                  |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
stack@devstack:~/devstack$ openstack volume create --size 3 --image encrypted-volume-image volume-from-LUKS-image
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| attachments         | []                                   |
| availability_zone   | nova                                 |
| bootable            | false                                |
| consistencygroup_id | None                                 |
| created_at          | 2024-04-12T11:23:52.246101           |
| description         | None                                 |
| encrypted           | False                                |
| id                  | f790ecd4-eea6-4a8d-8cf6-d1a038c91d60 |
| migration_status    | None                                 |
| multiattach         | False                                |
| name                | volume-from-LUKS-image               |
| properties          |                                      |
| replication_status  | None                                 |
| size                | 3                                    |
| snapshot_id         | None                                 |
| source_volid        | None                                 |
| status              | creating                             |
| type                | ceph                                 |
| updated_at          | None                                 |
| user_id             | 6cf194afebb6469e8423f50500b5c3fc     |
+---------------------+--------------------------------------+

A seemingly unencrypted volume is created from the encrypted image. But as far as we know, there is no decrypting mechanism implemented in Cinder to go from such a Cinder-specific LUKS encryption in the image to an unencrypted volume.

We should definitely file a bug in Cinder for this one.

@josephineSei
Copy link
Contributor Author

How to test, if a server booted

When creating a server from a volume, like my above case, it will be shown as active even though this cannot be true

stack@devstack:~/devstack$ openstack server list
+-----------------+-----------------+--------+-----------------+------------------+----------+
| ID              | Name            | Status | Networks        | Image            | Flavor   |
+-----------------+-----------------+--------+-----------------+------------------+----------+
| 8ba1b529-d1d9-  | will-this-      | ACTIVE | private=10.0.0. | N/A (booted from | m1.small |
| 4f5b-a97d-      | server-boot?    |        | 47, fd13:d046:e | volume)          |          |
| 6756f61451e0    |                 |        | 727:0:f816:3eff |                  |          |
|                 |                 |        | :feae:b8e1      |                  |          |
| 66f8f821-ec26-  | my-new-server   | ACTIVE | private=10.0.0. | N/A (booted from | m1.small |
| 4264-807e-      |                 |        | 41, fd13:d046:e | volume)          |          |
| 36ec016d51f9    |                 |        | 727:0:f816:3eff |                  |          |
|                 |                 |        | :fe98:3e70      |                  |          |
+-----------------+-----------------+--------+-----------------+------------------+----------+

To show the boot log of a server the following command can be used:

stack@devstack:~/devstack$ openstack console log show my-new-server > other-server.log

The log will then look something like:

[    0.000000] Linux version 5.15.0-71-generic (buildd@lcy02-amd64-044) (gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0>
[    0.000000] Command line: LABEL=cirros-rootfs ro console=tty1 console=ttyS0
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Hygon HygonGenuine
[    0.000000]   Centaur CentaurHauls
[    0.000000]   zhaoxin   Shanghai
[    0.000000] x86/fpu: x87 FPU will use FXSAVE
[    0.000000] signal: max sigframe size: 1440
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffdcfff] usable
[    0.000000] BIOS-e820: [mem 0x000000007ffdd000-0x000000007fffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
.....

But if created from a server with an unencrypted volume from an encrypted image the log file remains empty:

stack@devstack:~/devstack$ openstack console log show 02d13a4c-a6a0-4b40-850c-9b7ddd13d4bb
     <- log output should be here
stack@devstack:~/devstack$

@josephineSei
Copy link
Contributor Author

I uploaded the new spec to gerrit: https://review.opendev.org/c/openstack/glance-specs/+/915726

@markus-hentsch
Copy link
Contributor

I reported the bug uncovered in #560 (comment) at https://bugs.launchpad.net/cinder/+bug/2061154

@josephineSei
Copy link
Contributor Author

I reported the bug uncovered in #560 (comment) at https://bugs.launchpad.net/cinder/+bug/2061154

I raised this in the IRC, the following was the answer:

<Luzi> eharney, while trying some things with the LUKS-image we found this behavior: https://bugs.launchpad.net/cinder/+bug/2061154
<Luzi> is this known?
<eharney> Luzi: i think this is done on purpose because we didn't want to have users accidentally decrypting volume images, but it might still be useful to let them copy the data around (rather than failing)
<eharney> it would make sense to document this behavior because it is kind of surprising in the standard use of volumes/images
<eharney> generally, people should make sure they are using an encrypted volume type
<Luzi> or maybe add a check for the a volume type with encryption type?
<eharney> i think we didn't want it to fail because there are situations (copying data around to different places rather than booting instances) where it's useful to allow this

@anjastrunk anjastrunk added the upstream Implemented directly in the upstream label Apr 15, 2024
@anjastrunk anjastrunk mentioned this issue Apr 15, 2024
59 tasks
@josephineSei
Copy link
Contributor Author

We currently evaluate whether we will also need a spec for Cinder. What we definitely need is a blueprint for Cinder to track all development. I have written a blueprint and tried to include all possible points, where there will be implementation needed in Cinder:
https://blueprints.launchpad.net/cinder/+spec/standardize-image-encryption-metadata
From my opinion there should be a spec, to at least show all use cases where encrypted images need to be considered.

@josephineSei
Copy link
Contributor Author

We got a review for the Spec (I am currently updating it) and there is one last question to discuss:

We wanted to introduce a new container format "encrypted" - so it would be easily visible to anyone, that this image is encrypted. To identify what the underlying format is, we introduced a property: "os_decrypt_container_format".

Now Nova and Cinder both would like to have the container format showing up in the original property and Cinder additionally would need some parameter to know whether the encrypted image is compressed or not.

The thing here is, that we could check whether encrypted images are always qcow or raw when the container_format and decrypt_container_format is set. The metadata could be set after creating an image in the upload step. So it may be allowed to create an encrypted image with a format neither cinder nor nova could use. We would like to avoid such bad user experience.

So we want to ask the following questions in the Glance meeting this thursday:

  1. would it be feasible to introduce a new top level property "encrypted"?
  2. what would be the implications?

@josephineSei
Copy link
Contributor Author

From the Cinder meeting today I got the wish to also create a small Cinder spec: https://etherpad.opendev.org/p/cinder-dalmatian-meetings

@josephineSei
Copy link
Contributor Author

I created the Cidner spec today: https://review.opendev.org/c/openstack/cinder-specs/+/919499

@josephineSei
Copy link
Contributor Author

We got reviews on the spec, and Markus and I are going through them and answering questions.

@josephineSei
Copy link
Contributor Author

I adjusted the Glance spec and removed the container_format 'encrypted', because this was one of the most discussed parts of the spec: https://review.opendev.org/c/openstack/glance-specs/+/915726

@josephineSei
Copy link
Contributor Author

I got feedback on the Glance patch with the question about what to do when the image conversion plugin is activated.
It needs among other things this config adjustment: https://github.com/openstack/glance/blob/master/etc/glance-image-import.conf.sample#L26

Image conversion as I understand it, is not something that can be triggered by a CLI command. Instead when the plugin is activated ALL images that are created will be automatically converted to the ( through the config ) specified format after uploading and before storing it.

This creates a few questions regarding encrypted images:

  1. Is this behavior also triggered when Nova or Cinder upload an image? If yes, then it will currently result in unusable images or worse: strange Errors. If not, this feature is not very well implemented, because in this case disk-formats will differ and not all be the same (as assumed by the user or the plugin).
  2. The currently supported target formats are qcow2, raw and vmdk. We will only allow qcow2 and raw images to be encrypted and uploaded, because other ones cannot be used by Cinder and Nova. Converting qcow2 to raw is possible, vice-versa needs testing. But we do not support conversion to vmdk, because we don't know the behavior and whether transforming from encrypted qcow2/raw to encrypted vmdk would be possible with qemu AND neither Cinder nor Nova use the encrypted vmdk format. So we would have to convert again in Nova and Cinder.

As this optional feature of Glance may already have problems with 1, and we would need to at least forbid uploading encrypted images when the target format is vmdk (maybe also when the target format is qcow2) this would be a lot more of implementation work. So we will for now render this out of scope for the spec. Maybe this can be done after the image encryption is in place and if it is needed by operators and users.

@markus-hentsch
Copy link
Contributor

I got feedback on the Glance patch with the question about what to do when the image conversion plugin is activated. It needs among other things this config adjustment: https://github.com/openstack/glance/blob/master/etc/glance-image-import.conf.sample#L26

Image conversion as I understand it, is not something that can be triggered by a CLI command. Instead when the plugin is activated ALL images that are created will be automatically converted to the ( through the config ) specified format after uploading and before storing it.

This creates a few questions regarding encrypted images:

1. Is this behavior also triggered when Nova or Cinder upload an image? If yes, then it will currently result in unusable images or worse: strange Errors. If not, this feature is not very well implemented, because in this case disk-formats will differ and not all be the same (as assumed by the user or the plugin).

2. The currently supported target formats are qcow2, raw and vmdk. We will only allow qcow2 and raw images to be encrypted and uploaded, because other ones cannot be used by Cinder and Nova. Converting qcow2 to raw is possible, vice-versa needs testing. But we do not support conversion to vmdk, because we don't know the behavior and whether transforming from encrypted qcow2/raw to encrypted vmdk would be possible with qemu AND neither Cinder nor Nova use the encrypted vmdk format. So we would have to convert again in Nova and Cinder.

As this optional feature of Glance may already have problems with 1, and we would need to at least forbid uploading encrypted images when the target format is vmdk (maybe also when the target format is qcow2) this would be a lot more of implementation work. So we will for now render this out of scope for the spec. Maybe this can be done after the image encryption is in place and if it is needed by operators and users.

Seems like this is part of the "Interoperable Image Import"1. Which references the following methods2:

  • glance-direct
  • web-download
  • copy-image
  • glance-download

This makes me wonder if this is applicable to images uploaded to Glance by Nova or Cinder at all. This is limited to images from external sources I think.

Nonetheless, at the latest when the user is initiating such an image upload we need to make sure that we account for this concerning the image encryption.
I agree that we should make it out of scope for the initial contribution in any case. Improving compatibility with advanced features later on after getting the base work done is always an option imho.

Footnotes

  1. https://docs.openstack.org/glance/latest/admin/interoperable-image-import.html#the-image-conversion

  2. https://docs.openstack.org/api-ref/image/v2/index.html#interoperable-image-import

@josephineSei
Copy link
Contributor Author

I am looking into the Cinder Side a little bit more because, we got a comment from Dan Smith regarding image conversion and Glance checking a few Image parameters:

Again, you either need to call out that image conversion will not be possible on encrypted images, or say that glance will do those things if it has the key_id (and access to it). Also note that without access to the key, glance won't be able to do any sort of image inspection (for virtual_size, format confirmation, backing file rejection, etc). I think it's worth mentioning those drawbacks here.

The image conversion does only seem to be triggered when a user is uploading an image, not if Cinder or Nova upload one. Forbidding that would put more responsibility on the user side, but we also do that with the image encryption.

The check for the virtual size can be omitted, because we only allow 2 types of encrypted images: qcow2 and raw and we want to introduce the "os_decrypt_size" parameter that should discribe the size of the unencrypted image. We may even mandate using this parameter. The format for encrypted raw images is only checkable, when decrypting the image.

So while this is a valid point to discuss, as Glance will reject images, that do not have the format, they say they have, the Glance team did not wanted to have the power to encrypt or decrypt images back when we discussed gpg-based image encryption. I doubt that this has changed and i also do not see a good reason, Glance should be able to do this.
Rather we should discuss, if these checks need to take place, and whether they could be moved to either the Client-side or better to Cinder and Nova.

@josephineSei
Copy link
Contributor Author

I edited the glance spec to forbid image conversion in a more explicit way and included some more comments on the glance spec.

@josephineSei
Copy link
Contributor Author

I adjusted the key-managing part of the spec to clarify the behavior of deleting keys and added a part to put images into ERROR state, when they are encrypted and image conversion is enabled and need to be done.

@josephineSei
Copy link
Contributor Author

I raised attention again for all the patches, in the irc glance channel and in the pop-up team meeting.

@markus-hentsch
Copy link
Contributor

Cinder and Glance patchsets have been updated to only deprecate cinder_encryption_key_id and cinder_encryption_key_deletion_policy but still support them in addition to the new names for the deprecation period as requested by upstream.

@josephineSei
Copy link
Contributor Author

I fixed some errors in the glance db data-migration patch. And looked through the Zuul logs for the Nova and Cidner patches.
All remaining tox-errors are already solved through the merge of the os-brick patch. Maybe this will also solve the tempest errors. I rechecked the nova patch, and we can do the same with the Cinder patch, if this solves the issue.

@markus-hentsch
Copy link
Contributor

Upstream, the Cinder Tempest tests in Zuul currently fail during volume cloning of encrypted volumes1:

cinder_tempest_plugin.scenario.test_volume_encrypted.TestEncryptedCinderVolumes.test_boot_cloned_encrypted_volume[compute,id-5bb622ab-5060-48a8-8840-d589a548b7e4,image,volume]
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    ...

      File "/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/cinder_tempest_plugin/scenario/test_volume_encrypted.py", line 162, in test_boot_cloned_encrypted_volume
    waiters.wait_for_volume_resource_status(

      File "/opt/stack/tempest/tempest/common/waiters.py", line 372, in wait_for_volume_resource_status
    raise exceptions.VolumeResourceBuildErrorException(

    tempest.exceptions.VolumeResourceBuildErrorException: volume 4bda115c-45e1-4785-a4f9-61f0a09cbe25 failed to build and is in ERROR status

Source code of the Tempest test: https://github.com/openstack/cinder-tempest-plugin/blob/1aa0a56fca85ce23343950431e28f41c4fa811c5/cinder_tempest_plugin/scenario/test_volume_encrypted.py#L150-L157

I started debugging this on my DevStack but so far haven't been able to reproduce a volume in ERROR state when cloning an existing encrypted one.

I happened to notice one difference though:

  1. if a volume is created from an encrypted image, the secret in Barbican of the image is cloned, including its name
  2. if a volume originally created from an encrypted image, the secret is cloned once again but this time its resulting name is set to None

I don't think cloning the secret name without any suffix (1) is desired at all because this is confusing for users when their image secret appears multiple times in Barbican by name.
I think we should change this.

This also seems to hint at differences of secret clone handling between (1) and (2).

I will keep investigating.

Footnotes

  1. https://zuul.opendev.org/t/openstack/build/df58b6af99de4e518f46a8530dd02baa/console

@josephineSei
Copy link
Contributor Author

The recheck for the Nova Patch led to a green zuul pipeline.
I get two Errors in tempest in the Glance db migration patch.


==============================
Failed 2 tests - output below:
==============================

tempest.api.image.v2.test_images_formats.ImagesFormatTest.test_accept_reject_formats_import[id-7c7c2f16-2e97-4dce-8cb4-bc10be031c85](vmdk-monolithicFlat-leak)
--------------------------------------------------------------------------------------------------------------------------------------------------------------

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):

      File "/opt/stack/tempest/tempest/api/image/v2/test_images_formats.py", line 148, in test_accept_reject_formats_import
    self.client.delete_image(image['id'])

.....


tempest.api.image.v2.test_images_formats.ImagesFormatTest.test_accept_reject_formats_import[id-7c7c2f16-2e97-4dce-8cb4-bc10be031c85](vmdk-sparse-with-footer)
-------------------------------------------------------------------------------------------------------------------------------------------------------------

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):

      File "/opt/stack/tempest/tempest/api/image/v2/test_images_formats.py", line 148, in test_accept_reject_formats_import
    self.client.delete_image(image['id'])

....

Both should not be related to the migration patch. I still wait for feedback from the Glance team, because the migration is not triggered on devstack.

@markus-hentsch
Copy link
Contributor

I don't think cloning the secret name without any suffix (1) is desired at all because this is confusing for users when their image secret appears multiple times in Barbican by name.
I think we should change this.

I adjusted the Cinder patchset to remove the name when cloning a secret.

@markus-hentsch
Copy link
Contributor

We got a review on the Glance patchset requesting the following changes:

  • add releasenotes entry
  • add unit test coverage for the castellan_exception.Forbidden branch of secret consumer registration
  • add unit test coverage for the ValueError branch of secret consumer registration
  • add "detailed documentation" (not specified further)

I'll look into those.

@markus-hentsch
Copy link
Contributor

Possible changes to disk_format

Initially, we proposed to introduce a new container_format to represent encrypted images.
Glance was not a fan of this and requested to not introduce new formats and instead solely rely on metadata for this1.
Due to the recent OSSA-2024-001 and OSSA-2024-002 CVEs, the proper marking and scanning of image formats has gained new importance.
This now leads to a rethinking and new requests for changes in our approach.

On 2024-08-28 in the weekly Glance IRC meeting2, @josephineSei and I couldn't participate due to a conflicting meeting but the image encryption patchsets were discussed nonetheless and an interesting point was raised:

14:08:35 <pdeore> I have added few suggestions on parameter change patch but I request other cores to have a look at those patches
14:08:39 <dansmith> I feel like we need to revisit a couple things about how we store these images
14:08:49 <dansmith> in light of the giant CVE recently
14:09:07 <dansmith> in that I think we need to have a specific disk_format for luks-encrypted images,
14:09:31 <dansmith> so that we can inspect them with a known target format and reject things that are supposed to be encrypted but aren't (and v-v)
14:10:06 <dansmith> that goes with my proposal to also basically stop using "raw" to mean "image of a PC-like disk or partition"
14:10:17 <dansmith> (in my defender spec)
14:10:36 <dansmith> so I feel like we probably need to discuss that with glance, cinder, and nova people together
14:11:21 <dansmith> much of the complexity in the recent CVE came around the fact that we can never trust the disk_format in glance, and many of the side attack vectors came by putting one format in glance but calling it something else

Then we received the following comment on the Glance patchset from Dan Smith (Nova):

This is sort of an awkward place to raise this, but:

I think something we learned from the recent mega-CVE is that I think we need a new disk_format for luks-encrypted images. So much of the complexity of handling that CVE came from all the side vectors by which you can fool nova, glance, and cinder into doing something bad by saying an image is in one format, but actually sending another. Specifically, I think we have got to stop allowing raw to be both a catch-all format, as well as the thing we use when we really mean "an image of a PC-like disk".

Having to probe raw images to see if they smell like a LUKS disk, and if not, assume it's a regular raw is inviting more possibility for issue here I think.

I've got a spec proposed to make glance inspect and reject uploads that do not conform to the stated disk_format (i.e. you said it was raw, but uploaded a vmdk -> fail). It's here:

https://review.opendev.org/c/openstack/glance-specs/+/925111

and I have a LUKS inspector proposed against format_inspector:

https://review.opendev.org/c/openstack/oslo.utils/+/926809

those two things together will basically require that you declare a thing as luks before uploading it. One could also make the argument that we should use container_format=luks and disk_format=gpt for the typical arrangement, but I think that's more complicated. Either way, some discussion is required, IMHO.

As a result, we once again need to rethink and rework our encryption format approach in the Cinder and Glance patchsets to address this, it seems.

Footnotes

  1. https://github.com/SovereignCloudStack/standards/issues/560#issuecomment-2122616555

  2. https://meetings.opendev.org/meetings/glance/2024/glance.2024-08-29-14.00.log.html

@josephineSei
Copy link
Contributor Author

josephineSei commented Sep 3, 2024

After reading through the Spec from Dan and looking into our own spec: https://review.opendev.org/c/openstack/glance-specs/+/915726/11/specs/2024.2/approved/glance/standardized_image_encryption.rst

I would like to propose an update to our specs for Cinder and Glance. We should definitely state it there, because people will find these specs also when searching for documentation.

Here is what we could do in Glance: https://review.opendev.org/c/openstack/glance-specs/+/927819/1/specs/2024.2/approved/glance/standardized_image_encryption.rst

@markus-hentsch
Copy link
Contributor

In addition to your spec updates and Dan Smith's comment:

Either way, some discussion is required, IMHO.

I don't think we could or should go ahead and simply introduce some arbitrary disk_format in the patchsets now. We will need to reach an agreement with Glance and Cinder here on which exact changes would be okay for both sides, first.

I participated in today's Cinder IRC meeting, raised awareness of the topic again and asked people to attend next monday's popup meeting.

@markus-hentsch
Copy link
Contributor

markus-hentsch commented Sep 5, 2024

I attended today's Glance meeting: seems like we're back to the drawing board regarding the image metadata attributes associated with encrypted images and need to discuss this in the next PTG with the teams of Nova, Glance and Cinder:

14:07:28 <mhen> initially we agreed on using metadata only without touching container_format or disk_format for encrypted images, however dansmith raised a valid point about that being problematic for format detection/inspection
14:07:34 <mhen> considering the recent CVEs
14:07:48 <mhen> and suggested to introduce a new disk_format after all
14:08:23 <mhen> we have two cases to consider: 1) raw luks (like from cinder volumes) and 2) qcow2+luks
14:09:00 <dansmith> tbh, neither disk_format nor container_format really match in my head,
14:09:22 <dansmith> although qemu calls luks a disk format like qcow or vmdk, so it's probably a reasonable thing to mirror
14:09:52 <dansmith> but yeah, the recent CVEs (and the fallout since) has definitely affected my opinion on, well, a lot of things :/
14:10:38 <mhen> should that be two new disk formats then? "luks" and "qcow2+luks" for example or do we want to split the inner and outer format (raw/qcow, luks) into different attributes?
14:15:45 <pdeore> I think we should have the detailed discussion on this during PTG with glance, nova and cinder teams together
14:17:15 <mhen> if you have the time to attend, we can also start discussing this in the popup meeting on Monday: https://meetings.opendev.org/#Image_Encryption_Popup-Team_Meeting
14:17:34 <mhen> but yea, the PTG sounds like a good place to get everyone together for this
14:19:59 <pdeore> +1 to PTG, because not sure if people would be able to join popup meeting

Source: https://meetings.opendev.org/meetings/glance/2024/glance.2024-09-05-14.00.log.html

Update:

@josephineSei
Copy link
Contributor Author

I think It would be good to have this conversation at the PTG in a cross-project session together with Cinder (and maybe Nova) to get everyone to agree on one way.

But even more important: I think we (I can try to do so) should reach out to Dan Smith to sketch out one or more possible ways of implementing the image encryption - all BEFORE the PTG happens. He is the one working on the image format checker, so he knows best, what information is needed to verify that there is no malicious image upload.

@markus-hentsch
Copy link
Contributor

@josephineSei and I held today's image encryption popup team meeting on IRC1.
Although we did already agree on a PTG session about the topic as noted above, we wanted to discuss the topic in preparation to get the basics down.
Below is a short summary of the main talking points:

About formats:

  • recap: we have two major formats to consider: LUKS (raw) and qcow2+luks
  • fungi argued that qcow2+luks is not actually a new image type but is only implementation of additional feature support for disk_format=qcow2 (luks encryption support for qcow2 image)
    • we would have disk_format=qcow2 + os_encrypt_key_id=... marking such image; it can also be identified by looking at the qcow2 header
  • the only new format is disk_format=luks which implements raw LUKS images
  • using container_format is out of the question, since Glance states: "The container format refers to whether the virtual machine image is in a file format that also contains metadata about the actual virtual machine."2 which is inapplicable here

About vulnerabilities:

  • handling raw LUKS image through qemu needs to be closely evaluated:
    1. a malicious VMDK image is encrypted with LUKS and stored as an image
    2. the image is transferred to a volume
    3. since Cinder natively supports LUKS, it is written 1:1 to the volume and not decrypted/converted
    4. when attaching the volume via qemu/libvirt using native LUKS, qemu and/or KVM will parse the LUKS header and handle encryption/decryption of blocks
    5. the question is, does the wrapped VMDK trigger anything within that qemu handling or is it invisible/untouched by qemu and directly passed to the guest kernel where it cannot reach outside the VM?

(past CVEs were based on qemu encountering a VMDK image with special instructions and executing them; but in this case, only the outer LUKS layer should actually be visible to qemu but that needs to be verified)

Footnotes

  1. https://meetings.opendev.org/meetings/image_encryption/2024/image_encryption.2024-09-09-13.00.log.html

  2. https://docs.openstack.org/glance/latest/user/formats.html

@josephineSei
Copy link
Contributor Author

We took part in the PTG this week and we discussed with Cinder, Glance and some Nova people the impact of the CVEs on this topic.

Our discussion resulted in adjustments we need to do on the spec and the patches:

  1. We will introduce a new disk_format named LUKS for raw and gpt images. qcow2 images will keep their disk_format as the way encryption is handled differs between these images.
  2. The os_encryption_format now describes the specific version of the mechanism used: LUKSv1 and maybe eventually LUKSv2
  3. We also discussed about the image inspector, that may need some update todetect and handle the two ways of encrypted images.

For now I adjusted the spec accordingly: https://review.opendev.org/c/openstack/glance-specs/+/927819

@markus-hentsch
Copy link
Contributor

We took part in the PTG this week and we discussed with Cinder, Glance and some Nova people the impact of the CVEs on this topic.

Just to complement this, here are my notes from the meeting about what we agreed on:

  1. Extend the existing disk_format=qcow2 with feature support for qcow2+luks encryption.
  2. Add new disk_format=luks for raw LUKS images. An OpenStack developer noted: this should be compatible with container_format=compressed.
  3. For both disk_format=qcow2 and disk_format=luks the image metadata property os_encrypt_key_id should be used to reference the key in Barbican.
  4. For both disk_format=qcow2 and disk_format=luks the image metadata property os_encrypt_format will specify the LUKS version used: luksv1 or luksv2. Support in the image inspector may be added to verify this via the qcow or LUKS header.
  5. When Cinder consumes a qcow2+luks image, it is converted to raw LUKS.
  6. Barbican keys representing LUKS passphrases (referenced by os_encrypt_key_id) are handled according to their secret type (if "passphrase" then skip hexlify else do hexlify), which is implemented by the already merged os-brick patch. We keep this approach.

@markus-hentsch
Copy link
Contributor

I was just starting to work on dissecting the hex values of a LUKS header in order to add proper image inspector support in Glance when I happened to notice that the implementation was moved to oslo.utils in August 1.

Furthermore, support for LUKSv1 header inspection for a luks disk_format was merged just a few days ago: openstack/oslo.utils@7ab82d4

It seems that this is a direct result of the PTG session and one less thing to consider in our patchsets.
I need to rebase the current patchsets first to catch all the latest changes though.

Footnotes

  1. https://github.com/openstack/glance/commit/0be2737d66ea90b64270e09402ea5b088a28f415

@markus-hentsch
Copy link
Contributor

I revised the patchsets for Glance and Cinder today. I added disk_format=luks support and the LUKSv1 / LUKSv2 handling for os_encrypt_format in Cinder.

@josephineSei I removed os_encrypt_cipher from the implementation for now. We currently have two formats:

  • disk_format=qcow2 + os_encrypt_* metadata
  • disk_format=luks + os_encrypt_* metadata

In both cases, the os_encrypt_format can either be LUKSv1 or LUKSv2. The cipher details (e.g. AES-256, XTS mode) are details of the LUKS encryption and encoded directly in the LUKS header. LUKS can handle the header without prior knowledge of the cipher. As such, any os_encrypt_cipher metadata would be purely cosmetic but serve no purpose in processing the images as LUKS/qcow will read the header themselves.

As a result, I started wondering whether trying to properly identify the cipher used (either by inspecting the header in the OSC during user upload or trying to guess it from the Volume Type encryption metadata in case of Cinder's volume-to-image) is worth the risk of getting it wrong, considering it has no functional purpose and is just informational metadata at this point.

I think we could keep things a bit more simple by dropping os_encrypt_cipher and let the LUKS tools do their job based on the LUKS header (which is the actual source of truth in this case), keeping only os_encrypt_format as a coarse differentiation for the image inspector.
@josephineSei what's your opinion on this?

Note: I still need to add os_decrypt_size support to Cinder.

@josephineSei
Copy link
Contributor Author

I think you are correct, we do net loose information, when removing the os_encrypt_cipher. I adjust the spec accordingly.

We still have the following metadata:

  • os_encrypt_format - the specific mechanism used, e.g. 'LUKSv1'
  • os_encrypt_key_id - reference to key in the key manager
  • os_encrypt_key_deletion_policy - on image deletion indicates whether the
    key should be deleted too
  • os_decrypt_container_format - format change, e.g. from compressed to
    bare
  • os_decrypt_format - the image format after decryption, when the
    disk_format is LUKS. It should be raw or gpt
  • os_decrypt_size - size after payload decryption

The ones that MUST be set are:

  • os_encrypt_format - the specific mechanism used, e.g. 'LUKSv1'
  • os_encrypt_key_id - reference to key in the key manager
  • os_decrypt_size - size after payload decryption

The other ones are:
The os_encrypt_key_deletion_policy defaults to FALSE, when I am reading the docs in the MR correct.

The os_decrypt_container_format may only be needed, when the container format is changed, e.g. for compressed.
The os_decrypt_format is only required for the disk_format = LUKS.

I think the decrypt format is needed, but it is also new (due to the disk_format being LUKS), and not added yet to the patches, right? @markus-hentsch do you want to discuss this point? Maybe it is not needed, because we only allow raw/gpt images in LUKS blocks... But I am not sure about this.

@josephineSei
Copy link
Contributor Author

I try to get more attention to all of this and added reviewers to the patches.

@markus-hentsch
Copy link
Contributor

I updated the Glance implementation patchset and addressed review comments by adding releases notes and unit test coverage for secret consumer exceptions.

@markus-hentsch
Copy link
Contributor

I started testing the changed architecture and ran into a failing openstack image create --disk-format luks because now that we moved to introducing a new disk_format for raw LUKS images after all, we need to adjust the OSC code.

For this purpose, I started a patchset for python-openstackclient, which I also labelled with the "LUKS-image-encryption" topic: https://review.opendev.org/c/openstack/python-openstackclient/+/934672

@josephineSei
Copy link
Contributor Author

After our discussion About the 'os_decrypt_format' I removed the mention of this from the spec, and answered abhisheks comment on this. I also included, that we expect all luks-images to be 'raw' after encryption.

@markus-hentsch
Copy link
Contributor

Cinder internally rewriting format detection from 'luks' to 'raw' for qemu-img conversions

During testing of the changed implementation, I discovered that the shift in direction to actually introduce disk_format=luks has more repercussions in Cinder than I originally thought. Cinder makes extensive use of qemu-img for format detection and conversion.

However, they use an override to treat images received from Glance that qemu-img detects as 'luks' as 'raw' instead because they expect those images to be originating from Cinder (former snapshots of LUKS volumes). The introduction of disk_format=luks now has side effects that will make Cinder erronously attempt conversion:

  • check_image_format() will erronously report a format mismatch because the image metadata states it is 'luks' but the aforementioned file detection override to 'raw' makes it look like raw
  • upload_to_image() (image from volume) will think that a conversion (encryption) from raw (volume) to luks (image) is necessary and attempts to encrypt the already encrypted data
  • fetch_to_volume_format() (volume from image) will think that a conversion from luks (image) to raw (volume) is necessary and attempts to decrypt the image data

In all cases, we actually have the same format on both sides (luks) but Cinder treats its own LUKS data (e.g. volumes) as 'raw'. We don't want any encryption or conversion here but simply have the data copied over.

At first I tried removing the override and let Cinder actually detect its own data as 'luks'. However, this led to some very tricky pitfalls because as soon as 'luks' is on some side of a qemu-img convert command (which Cinder uses a lot to automatically process necessary conversion between volumes and images) it will attempt encryption/decryption and fail because no passphrases are provided to method calls. There would have been numerous places in the code where special context-dependent handling would have been necessary to avoid qemu-img thinking it needs to encrypt/decrypt something.

It turned out to be actually easier to keep treating Cinder's internal data as 'raw' and have isolated instances of function flags that allow overriding the qemu-img behavior to treat the data as raw and simply copy it over.
I adjusted the Cinder patchset to implement that.

@markus-hentsch
Copy link
Contributor

I adjusted unit tests and added release notes to the Cinder patchset. Regular pipelines look good but IBM and StorPool integration tests fail.

@josephineSei I added the os_decrypt_format/gpt topic along with a regular update and pipeline questions to https://etherpad.opendev.org/p/cinder-epoxy-meetings for discussion tomorrow.

@markus-hentsch
Copy link
Contributor

Compatibility with image compression (container_format=compressed)

Since I stumbled upon a few spots referencing image compression while writing the patchset in Cinder, I had a closer look at the topic and tested it with the image encryption introduced by the patchset(s).

NOTE: The OpenStack client does not allow to specify --container-format compressed neither for image create --file nor image create --volume. The use case for the image compression can only be triggered when interacting with the API directly, e.g., via curl! As such, it will be a niche use case but still should be checked for compatibility with the patchset.

Compressed image from volume

With the current patchset, Cinder will prevent volume to image with compressed flag:

VOLUME_NAME=test-luks-img
VOLUME_ID=$(openstack volume show -c id -f value $VOLUME_NAME)
PROJECT_ID=$(openstack token issue -f value -c project_id)
URL="http://192.168.144.111/volume/v3/$PROJECT_ID/volumes/$VOLUME_ID/action"
TOKEN=$(openstack token issue -f value -c id)

curl -X POST $URL -H "X-Auth-Token: $TOKEN" -H "Content-Type: application/json" \
-d '{"os-volume_upload_image": {"force": false, "image_name": "compressed-luks-test", "container_format": "compressed", "disk_format": "luks"}}'

  {"badRequest": {"code": 400, "message": "An encrypted volume uploaded as an image
   must use 'luks' disk_format and 'bare' container_format."}}

Conclusion:
As openstack image create --volume does not allow using the compressed container_format, the only way to trigger this is a direct API call.
Furthermore, compressing an encrypted block device image will not yield any useful results as most encryptions will not achieve smaller file size when compressed after (!) encryption.
As such, I think this is a non-issue and disallowing compression here is not problematic.

Volume from commpressed image

Since openstack image create --file cannot be used here, this is a bit more complicated to construct.
We need to create an empty image metadata object and then upload the compressed data to it.

# compress the image
cat cirros.raw.luks | gzip > cirros.raw.luks.gzip
file cirros.raw.luks.gzip
  cirros.raw.luks.gzip: gzip compressed data, from Unix, original size modulo 2^32 119508992

# retrieve Barbican key ID of the encryption key
SECRET_ID=$(openstack secret list -f value --name luks-image-passphrase \
    | head -n1 | cut -d' ' -f1 | rev | cut -d'/' -f1 | rev)
TOKEN=$(openstack token issue -f value -c id)

# create the empty image vessel with metadata
curl -X POST http://192.168.144.111/image/v2/images \
-H "X-Auth-Token: $TOKEN" -H "Content-Type: application/json" \
-d '{"disk_format": "luks", "name": "test-compressed-luks", "container_format": "compressed", "os_encrypt_key_id": "'$SECRET_ID'", "os_encrypt_key_deletion_policy": "none", "os_encrypt_format": "LUKSv1"}'
IMAGE_ID=$(openstack image show test-compressed-luks -f value -c id)

# upload the binary gzip-compressed data to the image vessel
python3 -c "import requests; headers = {'Content-Type': 'application/octet-stream', 'X-Auth-Token': '$TOKEN'}; r = requests.put('http://192.168.144.111/image/v2/images/$IMAGE_ID/file', data=open('cirros.raw.luks.gzip', 'rb'), headers=headers); print(r.status_code, r.text)"

# verify the data
openstack image save --file downloaded-compressed-image.gzip $IMAGE_ID
md5sum *.gzip
  366e855538951f13cd1ff40f28b17d65  cirros.raw.luks.gzip
  366e855538951f13cd1ff40f28b17d65  downloaded-compressed-image.gzip

# create the volume
openstack volume create --image $IMAGE_ID --size 1 --type lvmdriver-1-LUKS test-luks-gzip-img

Note that for binary image data upload, both curl -X PUT --data-binary and curl -X PUT --upload-file ended up with a 500 internal server error ("OSError: timeout during read(8192) on wsgi.input"). That's why I had to resort to the Python command instead for uploading the compressed data.

Compression disabled in Cinder

If cinder.conf:allow_compression_on_image_upload was disabled (which is the default), the volume will enter error state and cinder-volume will print the following log message:

cinder.exception.ImageUnacceptable: Image 058fdc89-8701-4581-a0ea-5e6be99dc4f2
is unacceptable: Image compression disallowed, but container_format is compressed.

Compression enabled in Cinder

If cinder.conf contains

allow_compression_on_image_upload = True
compression_format = gzip

... then the created volume will eventually reach available state and can be used to create a succesfully booting VM in Nova:

openstack server create --flavor m1.tiny --network admin-private \
    --volume test-luks-gzip-img --security-group admin-access-group \
    vm-luksgzipimg-vol

openstack floating ip create --description "serverIPgzip" public
FIPGZIP=$(openstack floating ip list --long -f value -c Description -c "Floating IP Address" \
    | grep "serverIPgzip" | cut -d ' ' -f1)
openstack server add floating ip vm-luksgzipimg-vol $FIPGZIP
echo $FIPGZIP
ssh cirros@$FIPGZIP
  The authenticity of host '10.0.1.113 (10.0.1.113)' can't be established.
  ED25519 key fingerprint is SHA256:...
  This key is not known by any other names
  Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
  Warning: Permanently added '10.0.1.113' (ED25519) to the list of known hosts.
  [email protected]'s password:
  $ hostname
  vm-luksgzipimg-vol

Conclusion:
The current patchset seems compatible with gzip-compressed images without issues.

@markus-hentsch
Copy link
Contributor

I did some QA to the Cinder patchset.
I updated the patchset for Cinder and loosened the restriction on image-from-volume a bit: if the volume is encrypted, disk_format=raw is accepted at the API (like before) for the "os-volume_upload_image" action and the disk_format is internally rewritten to 'luks' automatically. This makes the API a bit more backwards compatible, as in disk_format=raw for "os-volume_upload_image" action will lead to LUKS-encrypted images just like before. It just adopts the new 'luks' disk_format when actually storing the image in Glance now to be in line with the standardization.

I added a bunch of unit tests to cover the intricate parts of the patchset where we do special handling for the luks format internally. With that, test coverage of the new behaviors in Cinder should be quite good now.

@josephineSei
Copy link
Contributor Author

I have looked into the zuul tests and found one failing unit test. I tried to fix it and also put the image encryption on the glance teams agenda to get more reviews and drive this forward.

@josephineSei
Copy link
Contributor Author

We got reviews on the spec (with a +2 :)) and on the glance patch. We may need to split up the glance patch in smaller patches, but most of the comments were on smaller issues.

@josephineSei
Copy link
Contributor Author

The patch set regarding the image encryption is in a good state:
https://review.opendev.org/q/topic:%22LUKS-image-encryption%22
We discussed several ways with upstream and in the last PTG we only redefined some details.
The implementation mostly needs only reviews from the openstack development teams and the patches may need some minor updates according to them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested SCS-VP10 Related to tender lot SCS-VP10 standards Issues / ADR / pull requests relevant for standardization & certification upstream Implemented directly in the upstream
Projects
Status: Doing
Development

No branches or pull requests

3 participants