Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Dashboard supports CRUD of virtual clusters #417

Open
Chong-Li opened this issue Dec 11, 2024 · 2 comments
Open

[Core] Dashboard supports CRUD of virtual clusters #417

Chong-Li opened this issue Dec 11, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@Chong-Li
Copy link
Collaborator

Chong-Li commented Dec 11, 2024

Description

With the main functionality of virtual cluster (#409) implemented in GCS, we also need APIs to Create/Read/Update/Delete virtual clusters.

These APIs (http) can be exposed by the dashboard, and here is a quick prototype:

  • /virtual_clusters
    • POST
    • Use this API to create or update a virtual cluster
    • Examples of the request and reply:
{
   "virtualClusterId":"virtual_cluster_1",  // Unique id of the virtual cluster
   "allocationMode":"mixed",  // The running mode of jobs
   "replicaSets":{  // The node type (same as pod template id) and count that will be assigned to this virtual cluster
       "4c8g":1,
       "8c16g":1
    },
   "revision":1734141542694321600  // The timestamp of the virtual cluster's most recent creation/update
}
{
   "result":true,
   "msg":"Virtual cluster created or updated.",
   "data":{
      "virtualClusterId":"virtual_cluster_1",
      "revision":1734141542694433731,  // The timestamp that this creation/update was enforced in gcs
      "nodeInstances":{  // The nodes that were actually assigned to this virtual cluster
         "033141204224b43e67f01ec314ba45c16892298a23e83c5182eec355":{  // The node id used in gcs
            "hostname":"ec2-33-141-204-224.us-west-2.compute.amazonaws.com",
            "templateId":"4c8g"
         },
         "033159116236f3f382597f5e05cadbc000655f862f389c41072cef73":{
            "hostname":"ec2-33-159-116-236.us-west-2.compute.amazonaws.com",
            "templateId":"8c16g"
         }
      }
   }
}
  • Some notes:

    • The mode in the request of virtual cluster creation/update specifies whether the jobs running inside the virtual cluster are willing to share nodes with each other, which will be a new feature introduced later.
    • Every time you want to update a virtual cluster, make a change based on the latest version (there might be more than one party of interest). So firstly get the virtual cluster's most recent metadata with a revision number (by the GET API below), then send your updated request with this revision number.
  • /virtual_clusters/{virtual_cluster_id}

    • DELETE
    • Use this API to delete a virtual cluster
  • /virtual_clusters

    • GET
    • Use this API to get the metadata of each virtual cluster.
    • Example of the reply:
{
   "result":true,
   "msg":"All virtual clusters fetched.",
   "data":{
      "virtualClusters":[
         {
            "virtualClusterId":"virtual_cluster_1",
            "allocationMode":"mixed",
            "nodeInstances":{  // The nodes assigned to this virtual cluster
               "033141204224b43e67f01ec314ba45c16892298a23e83c5182eec355":{
                  "hostname":"ec2-33-141-204-224.us-west-2.compute.amazonaws.com",
                  "templateId":"4c8g"
               },
               "033159116236f3f382597f5e05cadbc000655f862f389c41072cef73":{
                  "hostname":"ec2-33-159-116-236.us-west-2.compute.amazonaws.com",
                  "templateId":"8c16g"
               }
            },
            "revision":1734141542694433731  // The timestamp of the virtual cluster's most recent creation/update
         },
         {
            "virtualClusterId":"virtual_cluster_2",
            "allocationMode":"exclusive",
            "nodeInstances":{
               "0331761541565ea3c14fcc158a98e9a6eed9e0c3c6c86fa613ce6738":{
                  "hostname":"ec2-33-176-154-156.us-west-2.compute.amazonaws.com",
                  "templateId":"8c16g"
               },
               "0331280722461e5130088465a89bd8262738fbd301ae9ae06e1edf42":{
                  "hostname":"ec2-33-128-72-246.us-west-2.compute.amazonaws.com",
                  "templateId":"4c8g"
               }
            },
            "revision":1734132897622670263
         }
      ]
   }
}

Use case

No response

@Chong-Li Chong-Li added the enhancement New feature or request label Dec 11, 2024
@Chong-Li Chong-Li self-assigned this Dec 11, 2024
@Chong-Li
Copy link
Collaborator Author

@wumuzi520 any suggestions?

@Chong-Li
Copy link
Collaborator Author

Chong-Li commented Dec 17, 2024

After some offline discussion:

  1. Remove virtual_cluster_name (not needed)
  2. JobExecMode->AllocationMode: which specifies whether a node (inside this virtual cluster) will be allocated to a specific job exclusively (or allowing multiple jobs sharing a node)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant