Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RWX support and stability improvement #43

Merged
merged 5 commits into from
Sep 12, 2024
Merged

Conversation

Vicente-Cheng
Copy link
Collaborator

@Vicente-Cheng Vicente-Cheng commented Aug 29, 2024

I mixed two things because I developed the RWX feature based on the stability improvement.
I apologize for this. I thought the reviewers were the same, so please help to review it!

stability: f6b6c5a
RWX: c28f8af

IMPORTANT: Please do not create a Pull Request without creating an issue first.

Problem:

  • Support LH RWX volume
  • Improve the stability of volume operations

Solution:

  • Support LH RWX volume
  • Improve the stability of volume operations

Related Issue:
Stability:

RWX:

Test plan:


Volume Stability Test Plan

  1. Use the following yaml file to deploy 10 volumes on the guest cluster
    https://gist.github.com/Vicente-Cheng/8e635a2c0ca649d7256419d61a680b3d
  2. Move this workload to another node.

We could increase the volumes to 10, 15, 20, 30, or 40 to ensure stability.


RWX Support Test Plan

Prerequisites

  • Update the rbac harvester CSI as below
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
...
  name: harvesterhci.io:csi-driver
...
rules:
...
- apiGroups:
  - harvesterhci.io
  resources:
  - networkfilesystems
  - networkfilesystems/status
  verbs:
  - '*'
- apiGroups:
  - longhorn.io
  resources:
  - volumes
  - volumes/status
  verbs:
  - get
  - list

networkfilesystems is for the RWX (we also need the new controller (https://github.com/harvester/networkfs-manager)
volumes is for the stability checking

  • Deploy the networkfs-manager
  1. Get the chart files on https://github.com/harvester/networkfs-manager/tree/main/deploy/charts
  2. Use helm command to install
$ helm install harvester-networkfs-manager charts/ -n harvester-system

Test Steps

Please complete the Prerequisites first

  1. create 1.4.x harvester cluster
    1.1 (Options) create storage network
    1.2 (Options) enable RWX volume with storage network on Longhorn (REF: https://longhorn.io/docs/1.7.0/advanced-resources/deploy/storage-network/#setting-storage-network-for-rwx-volumes)
  2. create rke2 guest cluster
    2.1 (Options) if we enable storage network for RWX volume (on Longhorn), please also add this storage network to the VM for the guest cluster
  3. Replace the csi driver image with this PR
  4. create longhorn RWX SC on the host cluster.
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: longhorn-rwx
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"
  fromBackup: ""
  fsType: "ext4"
  nfsOptions: "vers=4.2,noresvport,softerr,timeo=600,retrans=5"
  1. create a new SC associated with the above longhorn-rwx SC as below (on the guest cluster)
allowVolumeExpansion: false
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2024-08-26T06:09:24Z"
  name: longhorn-rwx
  resourceVersion: "98127"
  uid: 697b1e2d-d4fb-4cc1-8ad5-9494c2fd283b
parameters:
  hostStorageClass: longhorn-rwx
provisioner: driver.harvesterhci.io
reclaimPolicy: Delete
volumeBindingMode: Immediate
  1. create the PVC with this new SC (remember you need to change the access mode to RWX as below)
截圖 2024-08-29 上午10 47 21 7. tested with this PVC should be attached to the multiple pods simultaneously.

pkg/csi/controller_server.go Outdated Show resolved Hide resolved
pkg/csi/controller_server.go Outdated Show resolved Hide resolved
pkg/csi/controller_server.go Outdated Show resolved Hide resolved
pkg/csi/controller_server.go Outdated Show resolved Hide resolved
pkg/csi/controller_server.go Outdated Show resolved Hide resolved
pkg/csi/controller_server.go Outdated Show resolved Hide resolved
pkg/csi/controller_server.go Outdated Show resolved Hide resolved
@Vicente-Cheng Vicente-Cheng force-pushed the wip-rwx branch 5 times, most recently from 72cf02d to c28f8af Compare September 1, 2024 08:11
@Vicente-Cheng Vicente-Cheng requested a review from bk201 September 1, 2024 08:13
Copy link

@tserong tserong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with the disclaimer that I'm not super-familiar with this part of the harvester codebase yet :-)

I wanted to check about the introduction of networkfs-manager. I assume we have this so that in future we can potentially support RWX volumes from non-Longhorn storage backends? My second question about that is: do we also need a PR against harvester to deploy that piece?

I also found https://docs.harvesterhci.io/v1.4/rancher/csi-driver/ which has the following note, which will want updating/removing once this feature goes in:

Currently, the Harvester CSI driver only supports single-node read-write(RWO) volumes. Please follow the issue #1992 for future multi-node read-only(ROX) and read-write(RWX) support.

Copy link
Member

@bk201 bk201 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Vicente-Cheng
Copy link
Collaborator Author

LGTM, with the disclaimer that I'm not super-familiar with this part of the harvester codebase yet :-)

I wanted to check about the introduction of networkfs-manager. I assume we have this so that in future we can potentially support RWX volumes from non-Longhorn storage backends? My second question about that is: do we also need a PR against harvester to deploy that piece?

Thanks @tserong,
Yes, the networkfs-manager should be able to handle the external NFS, not only the Longhorn RWX volume.
We will support it in the later version.

You are correct. We need to make some changes on the harvester side, like adding this new component and the new RBAC configuration.
I will file another PR for both things.

I also found https://docs.harvesterhci.io/v1.4/rancher/csi-driver/ which has the following note, which will want updating/removing once this feature goes in:

Currently, the Harvester CSI driver only supports single-node read-write(RWO) volumes. Please follow the issue #1992 for future multi-node read-only(ROX) and read-write(RWX) support.

Indeed, we also need to update the document for detailed examples of users using RWX volume on the downstream clusters.

Thanks!

Copy link
Member

@WebberHuang1118 WebberHuang1118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the PR. Will the Harvester chart update with networkfs-manager ?

pkg/csi/controller_server.go Outdated Show resolved Hide resolved
@Vicente-Cheng Vicente-Cheng force-pushed the wip-rwx branch 2 times, most recently from 8ad6135 to c28f8af Compare September 11, 2024 15:12
    Harvester CSI driver rely on the Longhorn volume for the RWO PVC on
    the downstream cluster. Currently, we did not check the volume
    status after removing it from VM. That might involve the longhorn
    volume migration flow. It will increase the potential risk.

    We need to make sure the corresponding volume is detached then we
    can try to attached (to VM) again on the migration target node.
    Then we can avoid the LH volume migration and it should make the
    workload migration on the downstream cluster more stable.

    So, we could check the deatched state of volume or attached to
    the correct node on the ControllerPublish stage.

Signed-off-by: Vicente Cheng <[email protected]>
Signed-off-by: Vicente Cheng <[email protected]>
@bk201 bk201 merged commit f1c4c34 into harvester:master Sep 12, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants