Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented a node replace flow #658

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RostislavPorohnya
Copy link
Contributor

No description provided.

@RostislavPorohnya RostislavPorohnya added the enhancement New feature or request label Dec 22, 2023
@RostislavPorohnya RostislavPorohnya self-assigned this Dec 22, 2023
spec:
name: "danylo-on-prem-cassandra"
name: "rostyslp-on-prem-cassandra"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove names from samples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

dvList := &cdiv1beta1.DataVolumeList{}
err = k8sClient.List(context.TODO(), dvList, &client.ListOptions{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change context.TODO() to ctx from functions arguments and change for all such cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 819 to 827
deleteUsed, err := deleteReplacedNodeResources(context.TODO(), r.Client, oldNode.ID)
if err != nil {
l.Error(err, "Cannot delete replaced node resources",
"old node id", oldNode.ID)

return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need this deleteUsed? err == nill Isn't that an option for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to remove deleteUsed var. What do you think about current approach?

objectName := strings.ToLower(b.K8sObject.GetName())
vm := &virtcorev1.VirtualMachine{}

for i := 0; i < 20; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets put this 20 to the consts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


var svcName string
if role == models.HeadlessServiceRoleLabel {
svcName = fmt.Sprintf("%s-%s", role, strings.ToLower(b.K8sObject.GetName()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this? strings.ToLower(b.K8sObject.GetName())) it should also work with identical name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I can just use GetName()

Comment on lines 978 to 1017
func getReplacedNodes(oldNodesDC, newNodesDC *v1beta1.DataCentreStatus) (newNode, oldNode *v1beta1.Node) {
for _, aNode := range oldNodesDC.Nodes {
for _, bNode := range newNodesDC.Nodes {
if aNode.ID == bNode.ID {
oldNode = nil
newNode = nil
break
}
oldNode = aNode
newNode = bNode
}

if oldNode != nil {
return
}
}

return nil, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you simplify this func?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@RostislavPorohnya RostislavPorohnya force-pushed the onpremises-provisioning-nodereplace branch from a225611 to 55b4430 Compare January 2, 2024 13:57
@RostislavPorohnya RostislavPorohnya force-pushed the onpremises-provisioning-nodereplace branch from 55b4430 to b967cfb Compare January 2, 2024 14:49
@@ -70,4 +70,5 @@ var (
ErrDebeziumImmutable = errors.New("debezium array is immutable")
ErrCreateClusterWithMultiDC = errors.New("Multiple data center is still not supported. Please create a cluster with one data centre and add a second one when the cluster is in the running state")
ErrOnPremicesWithMultiDC = errors.New("on-premises cluster can be provisioned with only one data centre")
ErrGenerateAvailableVMNameFailed = errors.New("failed to generate available VM name")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change to ErrGenerateVMName

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -800,7 +800,67 @@ func (r *CassandraReconciler) newWatchStatusJob(c *v1beta1.Cassandra) scheduler.
return err
}

// TODO move everything not related to the watch status job to other places
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to create an issue for this and write a detailed description. For example what exactly is not related to the status job? What other places of unrelated things should be moved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created issue. #661

Comment on lines 804 to 808
if !areStatusesEqual(&iCassandra.Status.ClusterStatus, &c.Status.ClusterStatus) {
if len(c.Status.DataCentres) != 0 &&
len(iCassandra.Status.DataCentres) != 0 {
bootstrap := newOnPremisesBootstrap(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement seems to be missing one more condition for on-premises check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added condition to check if cluster running on-premises

Comment on lines 820 to 837
err = deleteReplacedNodeResources(ctx, r.Client, oldNode.ID)
if err != nil {
if !k8serrors.IsNotFound(err) {
l.Info("Node replace is in progress. Deleting old vm resources",
"cluster name", c.Name,
"old id", oldNode.ID)

return nil
}

l.Error(err, "Cannot delete replaced node resources",
"old node id", oldNode.ID)

return err
}

l.Info("Node replace is in progress. Creating new resources",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error handling seems off, please refactor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@RostislavPorohnya RostislavPorohnya force-pushed the onpremises-provisioning-nodereplace branch from b967cfb to 48325d5 Compare January 3, 2024 12:04
controllers/clusters/cassandra_controller.go Show resolved Hide resolved
Comment on lines 168 to 178

return nil
}

return models.ErrGenerateVMName
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no node to replace, we return an error - I think this is a no good way to end the func? At least return an error like "No node to replace".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed code a bit, now should look better

@RostislavPorohnya RostislavPorohnya force-pushed the onpremises-provisioning-nodereplace branch from 48325d5 to 24db632 Compare January 3, 2024 13:50
Comment on lines +1019 to +1020
// Tries to delete old node resources and returns NotFound error if all resources already deleted
func deleteReplacedNodeResources(ctx context.Context, k8sClient client.Client, oldID string) (deleteDone bool, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, refactor the doc for the func.

}
}

return false, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return true, nil here?

Comment on lines +152 to +153
svc := &k8scorev1.Service{}
svcName := fmt.Sprintf("%s-%s", models.ClusterIPServiceRoleLabel, nodeName)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we set a new label for a ClusterIP service only? Should we set updated labels for the LB service if the cluster is a public one?

return vm, nil
}

return &vmList.Items[0], nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also check that len(vmList.Items) is equal to 1 maybe?

}
if k8serrors.IsNotFound(err) {
headlessSVC = newHeadlessService(
return &dvList.Items[0], nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same check here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants