-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-7485 control: Implement system reint to act on all pools #15551
base: master
Are you sure you want to change the base?
Conversation
Ticket title is 'dmg command to drain and reintegrate nodes from all pools' |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/1/execution/node/357/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/1/execution/node/354/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/1/execution/node/273/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/1/execution/node/304/log |
Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/1/execution/node/480/log |
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/1/execution/node/519/log |
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/2/execution/node/375/log |
Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/2/execution/node/387/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/2/execution/node/360/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/2/execution/node/369/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/2/execution/node/359/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/2/execution/node/364/log |
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15551/5/testReport/ |
Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Features: control Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Features: control Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
45f3e40
to
d96bbfd
Compare
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15551/6/testReport/ |
Features: control Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
…-stack/daos into tanabarr/control-reintpools-pernode Features: control Required-githooks: true Signed-off-by: Tom Nabarro <tom.nabarrointel.com>
Features: control Signed-off-by: Tom Nabarro <[email protected]>
Features: control pool Signed-off-by: Tom Nabarro <[email protected]>
Test stage Unit Test bdev on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15551/16/display/redirect |
Test stage Unit Test with memcheck on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15551/16/display/redirect |
Test stage Unit Test on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15551/16/display/redirect |
Test stage Unit Test bdev with memcheck on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15551/16/display/redirect |
Test stage NLT on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15551/16/display/redirect |
Features: control pool Signed-off-by: Tom Nabarro <[email protected]>
This PR is failing the copyright check but I didn't want to rerun all the code through the new hooks as it would probably require a force push and inconvenience reviewers. @daltonbohning does that sound reasonable or should I work out how to update the copyright notices? |
gatekeeper please use PR title and description as commit message when landing |
Yeah, the copyright hook is not foolproof and it's tricky because the copyright is more about when the work was done, not when the commit was merged to master. I don't know the right answer, but if you do want to update them as if all this work was done in 2025, here is a commit for that: (I had to do some manual trickery to get this) The hook should update the copyright for any new changes on this PR, assuming you have the hooks setup. |
…intpools-pernode Features: pool Signed-off-by: Tom Nabarro <[email protected]>
Features: pool Signed-off-by: Tom Nabarro <[email protected]>
Test stage Unit Test on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15551/20/display/redirect |
@@ -134,7 +134,7 @@ func (m MgmtMethod) String() string { | |||
MethodPoolExclude: "PoolExclude", | |||
MethodPoolDrain: "PoolDrain", | |||
MethodPoolExtend: "PoolExtend", | |||
MethodPoolReintegrate: "PoolReintegrate", | |||
MethodPoolReint: "PoolReint", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, I think this line still needs to be fixed. These names and strings are totally internal to Go.
DRPC_METHOD_MGMT_REINTEGRATE = 226, | ||
DRPC_METHOD_MGMT_CONT_SET_OWNER = 227, | ||
DRPC_METHOD_MGMT_EXCLUDE = 228, | ||
DRPC_METHOD_MGMT_EXTEND = 229, | ||
DRPC_METHOD_MGMT_POOL_EVICT = 230, | ||
DRPC_METHOD_MGMT_DRAIN = 231, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation for renumbering these? It shouldn't cause any problems since daos_server and daos_engine must be the same version, but... why? If you're just standardizing the naming without changing the meaning, IMO it's better to rename in place, with the same numbers.
Doc-only: false Signed-off-by: Tom Nabarro <[email protected]>
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/21/execution/node/1453/log |
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/21/execution/node/1470/log |
@kjacque I would like to get this PR landed and I don't think we should block on variable renaming issues, I have reverted most of the go name changes and I don't think reverting the dRPC method names warrants another run through CI. If it's okay with you can we go ahead with this version of the PR and move on to bigger fish? TIA |
CI failures all attributable to DAOS-16921 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dRPC opcode renumbering is the only thing that really bothers me, but I can let it go because I can't see a way that it could break actual operation. daos_server
and daos_engine
must be the same version, and these opcodes only communicate between server and engine. That said, I think it's generally a bad idea to renumber these method IDs. We treat them as an API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this should be landed without reverting the truncated names in the Go code. All of the renaming makes the patch larger than necessary, and it introduces inconsistency in the naming. If you want to have shorter names on the C side to conform to those conventions, that's fine, but there's no good reason to impose the C conventions on the Go code.
To be clear: "Reintegrate" -> "Reint" is a disimprovement. Please revert those changes specifically.
Add dmg system reint command to reintegrate a set of storage nodes or
ranks from all the pools they belong to. Takes --ranks or --rank-hosts in
ranged format.
Required-githooks: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: