Ephemeral mounts for NVMe data drives #147

Zarquan · 2024-01-17T04:17:51Z

In order to test the performance of the NVMe data drives, how can we import them into our Openstack VMs ?

Ideally it would be good to mount them as separate discs in the VMs, but I'm not sure that would be possible.

To start with can we create some VM flavors that have large (~900G byte) ephemeral discs that are mapped onto the the NVMe data drives.

   {
    "ID": "....",
    "Name": "gaia.vm.26vcpu.916nvme",
    "RAM": 44032,
    "Disk": 20,
    "Ephemeral": 916,
    "VCPUs": 26
  }

The text was updated successfully, but these errors were encountered:

GregBlow · 2024-01-17T11:46:11Z

Good morning,

These were isntalled on the supermicro hypervisors. There were a few mechanisms considered for how to properly segregate these, including availability zoning. However it looks likely the better way is to deploy as ephemeral volumes on feature-restricted flavors.

Can you let us know the flavors of the VMs you intend to test these on please?

Zarquan · 2024-01-17T14:03:09Z

I agree that special flavors is probably the easiest way to manage them.

Colleagues from STFC cloud at RAL recommended we should try out Longhorn to aggregate ephemeral storage from a set of VMs to create a large storage volume that can be mounted in Kubernetes as a persistent volume.

I'd like to test two different scenarios:

Adding the nvme discs to a special version of the 26 core flavor that is used for the Spark workers. Handling the shared data store on the same VMs as the Spark workers.

   {
    "ID": "....",
    "Name": "gaia.vm.26vcpu.916nvme",
    "RAM": 44032,
    "Disk": 20,
    "Ephemeral": 916,
    "VCPUs": 26
  }

Adding the ephemeral discs to a special version of the 4 core flavor that we can use to create a separate set of VMs specifically for handling the shared data.

{
    "ID": "....",
    "Name": "gaia.vm.4vcpu.916nvme",
    "RAM": 6144,
    "Disk": 22,
    "Ephemeral": 916,
    "VCPUs": 4
  },

GregBlow · 2024-02-14T12:57:02Z

Good afternoon,

I have reconfigured the systems hypervisors with a new variety of labelling that allows locking of flavours to specific hypervisors and provided the two new flavours specified.

It's an experimental system, so might not behave precisely as it should and is subject to change, though preliminary tests look good. Could you please test and verify?

Regards,

Greg

GregBlow · 2024-02-14T13:21:18Z

(scratch that, flavours very like the ones you asked for work. Trying to differentiate.)

GregBlow · 2024-02-14T13:22:24Z

oh, ephemeral volumes of 916GB will not work. The hypervisors have 786GB SSDs.

GregBlow · 2024-02-14T13:34:04Z

I've added a new set of flavours with 768GB ephemeral volumes configured. Easy to create more if you'd like different configurations.

Zarquan · 2024-02-14T13:45:08Z

Thanks setting it up. I have some work to finish on the Cambridge Arcus system, but I hope to get a chance to experiment with Longhorn.

GregBlow · 2024-02-14T16:02:12Z

note: there are 4 hypervisors with these SSDs mounted that are presently for your exclusive use. If your experiments require a larger number of volumes we'll need smaller flavours provisioned.

Zarquan · 2024-02-19T04:57:28Z

Unable to experiment with the new flavors due to issues with the platform.
See #144

GregBlow · 2024-06-25T13:24:09Z

@DP-B21 Can you try creating instances with each of the two new flavours (they're locked to the Gaia project, you'll need to create under that project) and see if they correctly place on the supermicro hypervisors please?

GregBlow · 2024-07-02T15:53:42Z

has ceased to work (confirmed by myself and @DP-B21 , flavour gaia-4vcpu-916nvme goes to error)

GregBlow · 2024-09-30T09:16:36Z

Related to new hypervisor flavour configuration.

GregBlow · 2024-12-04T15:45:05Z

deployment of tiny flavour instance on testbed AZ works correctly. gaia flavour gaia-4vcpu-768nvme fails out with:

Traceback (most recent call last): File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/conductor/manager.py", line 705, in build_instances raise exception.MaxRetriesExceeded(reason=msg) nova.exception.MaxRetriesExceeded: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 90497d0d-fe7a-409f-8697-ec458d3d38a8.

GregBlow · 2024-12-04T15:48:32Z

024-12-04 15:43:28.292 17 DEBUG nova.scheduler.manager [None req-37e0db8d-00d9-4306-8422-3774c90c82be 3ad62106189e426f87d3233161e060ec af29bdf3eb4d4467a855f54a0441d9ab - - default default] Weighed [WeighedHost [host: (sv-sm-0-0, sv-sm-0-0) ram: 102157MB disk: 710656MB io_ops: 0 instances: 2, allocation_candidates: 1, weight: 1.9658051610201317], WeighedHost [host: (sv-sm-0-1, sv-sm-0-1) ram: 86029MB disk: 710656MB io_ops: 0 instances: 2, allocation_candidates: 1, weight: 1.901193141614091], WeighedHost [host: (sv-sm-0-2, sv-sm-0-2) ram: 249613MB disk: 762880MB io_ops: 0 instances: 0, allocation_candidates: 1, weight: -999997.0]] _get_sorted_hosts /var/lib/kolla/venv/lib64/python3.9/site-packages/nova/scheduler/manager.py:730
2024-12-04 15:43:28.292 17 DEBUG nova.scheduler.utils [None req-37e0db8d-00d9-4306-8422-3774c90c82be 3ad62106189e426f87d3233161e060ec af29bdf3eb4d4467a855f54a0441d9ab - - default default] Attempting to claim resources in the placement API for instance 721f511b-88cd-4c93-bf5f-065c028db66c claim_resources /var/lib/kolla/venv/lib64/python3.9/site-packages/nova/scheduler/utils.py:1283
2024-12-04 15:43:28.374 17 DEBUG nova.scheduler.manager [None req-37e0db8d-00d9-4306-8422-3774c90c82be 3ad62106189e426f87d3233161e060ec af29bdf3eb4d4467a855f54a0441d9ab - - default default] [instance: 721f511b-88cd-4c93-bf5f-065c028db66c] Selected host: (sv-sm-0-0, sv-sm-0-0) ram: 102157MB disk: 710656MB io_ops: 0 instances: 2, allocation_candidates: 1 _consume_selected_host /var/lib/kolla/venv/lib64/python3.9/site-packages/nova/scheduler/manager.py:590
2024-12-04 15:43:28.374 17 DEBUG oslo_concurrency.lockutils [None req-37e0db8d-00d9-4306-8422-3774c90c82be 3ad62106189e426f87d3233161e060ec af29bdf3eb4d4467a855f54a0441d9ab - - default default] Acquiring lock "('sv-sm-0-0', 'sv-sm-0-0')" by "nova.scheduler.host_manager.HostState.consume_from_request.<locals>._locked" inner /var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_concurrency/lockutils.py:402
2024-12-04 15:43:28.374 17 DEBUG oslo_concurrency.lockutils [None req-37e0db8d-00d9-4306-8422-3774c90c82be 3ad62106189e426f87d3233161e060ec af29bdf3eb4d4467a855f54a0441d9ab - - default default] Lock "('sv-sm-0-0', 'sv-sm-0-0')" acquired by "nova.scheduler.host_manager.HostState.consume_from_request.<locals>._locked" :: waited 0.000s inner /var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_concurrency/lockutils.py:407
2024-12-04 15:43:28.375 17 DEBUG oslo_concurrency.lockutils [None req-37e0db8d-00d9-4306-8422-3774c90c82be 3ad62106189e426f87d3233161e060ec af29bdf3eb4d4467a855f54a0441d9ab - - default default] Lock "('sv-sm-0-0', 'sv-sm-0-0')" "released" by "nova.scheduler.host_manager.HostState.consume_from_request.<locals>._locked" :: held 0.000s inner /var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_concurrency/lockutils.py:421

GregBlow self-assigned this Jan 17, 2024

GregBlow added enhancement New feature or request flavor gaia labels Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ephemeral mounts for NVMe data drives #147

Ephemeral mounts for NVMe data drives #147

Zarquan commented Jan 17, 2024

GregBlow commented Jan 17, 2024

Zarquan commented Jan 17, 2024

GregBlow commented Feb 14, 2024

GregBlow commented Feb 14, 2024 •

edited

Loading

GregBlow commented Feb 14, 2024

GregBlow commented Feb 14, 2024

Zarquan commented Feb 14, 2024

GregBlow commented Feb 14, 2024

Zarquan commented Feb 19, 2024

GregBlow commented Jun 25, 2024

GregBlow commented Jul 2, 2024

GregBlow commented Sep 30, 2024

GregBlow commented Dec 4, 2024

GregBlow commented Dec 4, 2024

Ephemeral mounts for NVMe data drives #147

Ephemeral mounts for NVMe data drives #147

Comments

Zarquan commented Jan 17, 2024

GregBlow commented Jan 17, 2024

Zarquan commented Jan 17, 2024

GregBlow commented Feb 14, 2024

GregBlow commented Feb 14, 2024 • edited Loading

GregBlow commented Feb 14, 2024

GregBlow commented Feb 14, 2024

Zarquan commented Feb 14, 2024

GregBlow commented Feb 14, 2024

Zarquan commented Feb 19, 2024

GregBlow commented Jun 25, 2024

GregBlow commented Jul 2, 2024

GregBlow commented Sep 30, 2024

GregBlow commented Dec 4, 2024

GregBlow commented Dec 4, 2024

GregBlow commented Feb 14, 2024 •

edited

Loading