-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented a node replace flow #658
base: main
Are you sure you want to change the base?
Conversation
spec: | ||
name: "danylo-on-prem-cassandra" | ||
name: "rostyslp-on-prem-cassandra" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets remove names from samples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
controllers/clusters/on_premises.go
Outdated
} | ||
|
||
dvList := &cdiv1beta1.DataVolumeList{} | ||
err = k8sClient.List(context.TODO(), dvList, &client.ListOptions{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change context.TODO()
to ctx
from functions arguments and change for all such cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deleteUsed, err := deleteReplacedNodeResources(context.TODO(), r.Client, oldNode.ID) | ||
if err != nil { | ||
l.Error(err, "Cannot delete replaced node resources", | ||
"old node id", oldNode.ID) | ||
|
||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need this deleteUsed? err == nill
Isn't that an option for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to remove deleteUsed
var. What do you think about current approach?
controllers/clusters/on_premises.go
Outdated
objectName := strings.ToLower(b.K8sObject.GetName()) | ||
vm := &virtcorev1.VirtualMachine{} | ||
|
||
for i := 0; i < 20; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets put this 20
to the consts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
controllers/clusters/on_premises.go
Outdated
|
||
var svcName string | ||
if role == models.HeadlessServiceRoleLabel { | ||
svcName = fmt.Sprintf("%s-%s", role, strings.ToLower(b.K8sObject.GetName())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this? strings.ToLower(b.K8sObject.GetName()))
it should also work with identical name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I can just use GetName()
controllers/clusters/on_premises.go
Outdated
func getReplacedNodes(oldNodesDC, newNodesDC *v1beta1.DataCentreStatus) (newNode, oldNode *v1beta1.Node) { | ||
for _, aNode := range oldNodesDC.Nodes { | ||
for _, bNode := range newNodesDC.Nodes { | ||
if aNode.ID == bNode.ID { | ||
oldNode = nil | ||
newNode = nil | ||
break | ||
} | ||
oldNode = aNode | ||
newNode = bNode | ||
} | ||
|
||
if oldNode != nil { | ||
return | ||
} | ||
} | ||
|
||
return nil, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you simplify this func?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
a225611
to
55b4430
Compare
55b4430
to
b967cfb
Compare
pkg/models/errors.go
Outdated
@@ -70,4 +70,5 @@ var ( | |||
ErrDebeziumImmutable = errors.New("debezium array is immutable") | |||
ErrCreateClusterWithMultiDC = errors.New("Multiple data center is still not supported. Please create a cluster with one data centre and add a second one when the cluster is in the running state") | |||
ErrOnPremicesWithMultiDC = errors.New("on-premises cluster can be provisioned with only one data centre") | |||
ErrGenerateAvailableVMNameFailed = errors.New("failed to generate available VM name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please change to ErrGenerateVMName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -800,7 +800,67 @@ func (r *CassandraReconciler) newWatchStatusJob(c *v1beta1.Cassandra) scheduler. | |||
return err | |||
} | |||
|
|||
// TODO move everything not related to the watch status job to other places |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to create an issue for this and write a detailed description. For example what exactly is not related to the status job? What other places of unrelated things should be moved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created issue. #661
if !areStatusesEqual(&iCassandra.Status.ClusterStatus, &c.Status.ClusterStatus) { | ||
if len(c.Status.DataCentres) != 0 && | ||
len(iCassandra.Status.DataCentres) != 0 { | ||
bootstrap := newOnPremisesBootstrap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement seems to be missing one more condition for on-premises check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Added condition to check if cluster running on-premises
err = deleteReplacedNodeResources(ctx, r.Client, oldNode.ID) | ||
if err != nil { | ||
if !k8serrors.IsNotFound(err) { | ||
l.Info("Node replace is in progress. Deleting old vm resources", | ||
"cluster name", c.Name, | ||
"old id", oldNode.ID) | ||
|
||
return nil | ||
} | ||
|
||
l.Error(err, "Cannot delete replaced node resources", | ||
"old node id", oldNode.ID) | ||
|
||
return err | ||
} | ||
|
||
l.Info("Node replace is in progress. Creating new resources", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error handling seems off, please refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
b967cfb
to
48325d5
Compare
controllers/clusters/on_premises.go
Outdated
|
||
return nil | ||
} | ||
|
||
return models.ErrGenerateVMName | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no node to replace, we return an error - I think this is a no good way to end the func? At least return an error like "No node to replace".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed code a bit, now should look better
48325d5
to
24db632
Compare
// Tries to delete old node resources and returns NotFound error if all resources already deleted | ||
func deleteReplacedNodeResources(ctx context.Context, k8sClient client.Client, oldID string) (deleteDone bool, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please, refactor the doc for the func.
} | ||
} | ||
|
||
return false, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return true, nil
here?
svc := &k8scorev1.Service{} | ||
svcName := fmt.Sprintf("%s-%s", models.ClusterIPServiceRoleLabel, nodeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we set a new label for a ClusterIP service only? Should we set updated labels for the LB service if the cluster is a public one?
return vm, nil | ||
} | ||
|
||
return &vmList.Items[0], nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also check that len(vmList.Items) is equal to 1 maybe?
} | ||
if k8serrors.IsNotFound(err) { | ||
headlessSVC = newHeadlessService( | ||
return &dvList.Items[0], nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same check here
No description provided.