Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ephemeral mounts for NVMe data drives #147

Open
Zarquan opened this issue Jan 17, 2024 · 14 comments
Open

Ephemeral mounts for NVMe data drives #147

Zarquan opened this issue Jan 17, 2024 · 14 comments
Assignees
Labels
enhancement New feature or request flavor gaia

Comments

@Zarquan
Copy link
Collaborator

Zarquan commented Jan 17, 2024

In order to test the performance of the NVMe data drives, how can we import them into our Openstack VMs ?

Ideally it would be good to mount them as separate discs in the VMs, but I'm not sure that would be possible.

To start with can we create some VM flavors that have large (~900G byte) ephemeral discs that are mapped onto the the NVMe data drives.

   {
    "ID": "....",
    "Name": "gaia.vm.26vcpu.916nvme",
    "RAM": 44032,
    "Disk": 20,
    "Ephemeral": 916,
    "VCPUs": 26
  }
@GregBlow GregBlow self-assigned this Jan 17, 2024
@GregBlow
Copy link
Collaborator

Good morning,

These were isntalled on the supermicro hypervisors. There were a few mechanisms considered for how to properly segregate these, including availability zoning. However it looks likely the better way is to deploy as ephemeral volumes on feature-restricted flavors.

Can you let us know the flavors of the VMs you intend to test these on please?

@Zarquan
Copy link
Collaborator Author

Zarquan commented Jan 17, 2024

I agree that special flavors is probably the easiest way to manage them.

Colleagues from STFC cloud at RAL recommended we should try out Longhorn to aggregate ephemeral storage from a set of VMs to create a large storage volume that can be mounted in Kubernetes as a persistent volume.

I'd like to test two different scenarios:

  1. Adding the nvme discs to a special version of the 26 core flavor that is used for the Spark workers. Handling the shared data store on the same VMs as the Spark workers.
   {
    "ID": "....",
    "Name": "gaia.vm.26vcpu.916nvme",
    "RAM": 44032,
    "Disk": 20,
    "Ephemeral": 916,
    "VCPUs": 26
  }
  1. Adding the ephemeral discs to a special version of the 4 core flavor that we can use to create a separate set of VMs specifically for handling the shared data.
{
    "ID": "....",
    "Name": "gaia.vm.4vcpu.916nvme",
    "RAM": 6144,
    "Disk": 22,
    "Ephemeral": 916,
    "VCPUs": 4
  },

@GregBlow
Copy link
Collaborator

Good afternoon,

I have reconfigured the systems hypervisors with a new variety of labelling that allows locking of flavours to specific hypervisors and provided the two new flavours specified.

It's an experimental system, so might not behave precisely as it should and is subject to change, though preliminary tests look good. Could you please test and verify?

Regards,

Greg

@GregBlow GregBlow added enhancement New feature or request flavor gaia labels Feb 14, 2024
@GregBlow
Copy link
Collaborator

GregBlow commented Feb 14, 2024

(scratch that, flavours very like the ones you asked for work. Trying to differentiate.)

@GregBlow
Copy link
Collaborator

oh, ephemeral volumes of 916GB will not work. The hypervisors have 786GB SSDs.

@GregBlow
Copy link
Collaborator

I've added a new set of flavours with 768GB ephemeral volumes configured. Easy to create more if you'd like different configurations.

@Zarquan
Copy link
Collaborator Author

Zarquan commented Feb 14, 2024

Thanks setting it up. I have some work to finish on the Cambridge Arcus system, but I hope to get a chance to experiment with Longhorn.

@GregBlow
Copy link
Collaborator

note: there are 4 hypervisors with these SSDs mounted that are presently for your exclusive use. If your experiments require a larger number of volumes we'll need smaller flavours provisioned.

@Zarquan
Copy link
Collaborator Author

Zarquan commented Feb 19, 2024

Unable to experiment with the new flavors due to issues with the platform.
See #144

@GregBlow
Copy link
Collaborator

@DP-B21 Can you try creating instances with each of the two new flavours (they're locked to the Gaia project, you'll need to create under that project) and see if they correctly place on the supermicro hypervisors please?

@GregBlow
Copy link
Collaborator

GregBlow commented Jul 2, 2024

has ceased to work (confirmed by myself and @DP-B21 , flavour gaia-4vcpu-916nvme goes to error)

@GregBlow
Copy link
Collaborator

Related to new hypervisor flavour configuration.

@GregBlow
Copy link
Collaborator

GregBlow commented Dec 4, 2024

deployment of tiny flavour instance on testbed AZ works correctly. gaia flavour gaia-4vcpu-768nvme fails out with:

Traceback (most recent call last): File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/conductor/manager.py", line 705, in build_instances raise exception.MaxRetriesExceeded(reason=msg) nova.exception.MaxRetriesExceeded: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 90497d0d-fe7a-409f-8697-ec458d3d38a8. 

@GregBlow
Copy link
Collaborator

GregBlow commented Dec 4, 2024

024-12-04 15:43:28.292 17 DEBUG nova.scheduler.manager [None req-37e0db8d-00d9-4306-8422-3774c90c82be 3ad62106189e426f87d3233161e060ec af29bdf3eb4d4467a855f54a0441d9ab - - default default] Weighed [WeighedHost [host: (sv-sm-0-0, sv-sm-0-0) ram: 102157MB disk: 710656MB io_ops: 0 instances: 2, allocation_candidates: 1, weight: 1.9658051610201317], WeighedHost [host: (sv-sm-0-1, sv-sm-0-1) ram: 86029MB disk: 710656MB io_ops: 0 instances: 2, allocation_candidates: 1, weight: 1.901193141614091], WeighedHost [host: (sv-sm-0-2, sv-sm-0-2) ram: 249613MB disk: 762880MB io_ops: 0 instances: 0, allocation_candidates: 1, weight: -999997.0]] _get_sorted_hosts /var/lib/kolla/venv/lib64/python3.9/site-packages/nova/scheduler/manager.py:730
2024-12-04 15:43:28.292 17 DEBUG nova.scheduler.utils [None req-37e0db8d-00d9-4306-8422-3774c90c82be 3ad62106189e426f87d3233161e060ec af29bdf3eb4d4467a855f54a0441d9ab - - default default] Attempting to claim resources in the placement API for instance 721f511b-88cd-4c93-bf5f-065c028db66c claim_resources /var/lib/kolla/venv/lib64/python3.9/site-packages/nova/scheduler/utils.py:1283
2024-12-04 15:43:28.374 17 DEBUG nova.scheduler.manager [None req-37e0db8d-00d9-4306-8422-3774c90c82be 3ad62106189e426f87d3233161e060ec af29bdf3eb4d4467a855f54a0441d9ab - - default default] [instance: 721f511b-88cd-4c93-bf5f-065c028db66c] Selected host: (sv-sm-0-0, sv-sm-0-0) ram: 102157MB disk: 710656MB io_ops: 0 instances: 2, allocation_candidates: 1 _consume_selected_host /var/lib/kolla/venv/lib64/python3.9/site-packages/nova/scheduler/manager.py:590
2024-12-04 15:43:28.374 17 DEBUG oslo_concurrency.lockutils [None req-37e0db8d-00d9-4306-8422-3774c90c82be 3ad62106189e426f87d3233161e060ec af29bdf3eb4d4467a855f54a0441d9ab - - default default] Acquiring lock "('sv-sm-0-0', 'sv-sm-0-0')" by "nova.scheduler.host_manager.HostState.consume_from_request.<locals>._locked" inner /var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_concurrency/lockutils.py:402
2024-12-04 15:43:28.374 17 DEBUG oslo_concurrency.lockutils [None req-37e0db8d-00d9-4306-8422-3774c90c82be 3ad62106189e426f87d3233161e060ec af29bdf3eb4d4467a855f54a0441d9ab - - default default] Lock "('sv-sm-0-0', 'sv-sm-0-0')" acquired by "nova.scheduler.host_manager.HostState.consume_from_request.<locals>._locked" :: waited 0.000s inner /var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_concurrency/lockutils.py:407
2024-12-04 15:43:28.375 17 DEBUG oslo_concurrency.lockutils [None req-37e0db8d-00d9-4306-8422-3774c90c82be 3ad62106189e426f87d3233161e060ec af29bdf3eb4d4467a855f54a0441d9ab - - default default] Lock "('sv-sm-0-0', 'sv-sm-0-0')" "released" by "nova.scheduler.host_manager.HostState.consume_from_request.<locals>._locked" :: held 0.000s inner /var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_concurrency/lockutils.py:421

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request flavor gaia
Projects
None yet
Development

No branches or pull requests

2 participants