-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GSoC] Compatibility Changes in Trial Controller #2394
[GSoC] Compatibility Changes in Trial Controller #2394
Conversation
@andreyvelich @johnugeorge PTAL👀 if you are available. Thanks! ref issue: #2340 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @Electronic-Waste!
cc @kubeflow/wg-automl-leads
/rerun-all |
/assign @johnugeorge @tenzen-y |
ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, lgtm
/lgtm |
/area gsoc |
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
@andreyvelich @tenzen-y I've fixed the flaky error by separating the UTs and adjusting the AFAIK, the flaky error is caused by the uncertain triggering times of the reconciliation, thus giving rise to the uncertainty of the times we call the function. The most annoying issue was that we must call So I reserved some |
@kubeflow/wg-automl-leads it seems that the coverage reports have some accidents. Could you please re-rerun these test cases and check them again? |
@Electronic-Waste I think, you can re-trigger tests by add this comment: |
/rerun-all |
Thanks @andreyvelich |
Not sure why coveralls fails on report tho. |
@andreyvelich AFAIK It fails sometimes. It may turn normal in a few hours. |
@andreyvelich There is an issue describing it: lemurheavy/coveralls-public#1716 It seems that the coverage report will fail if we rerun the old CI build: lemurheavy/coveralls-public#1716 (comment) |
Signed-off-by: Electronic-Waste <[email protected]>
@tenzen-y I think the flaky issue of UTs has been solved now. Could take a look if that looks good to you? |
@tenzen-y FYR, the coverage report will fail if we rerun the old CI build. Maybe we should restart all CI builds and test them all. |
/rerun-all |
@tenzen-y The
Maybe this comment lemurheavy/coveralls-public#1716 (comment) is useful for reference. |
Signed-off-by: Electronic-Waste <[email protected]>
I made some tiny changes in the comment lines. It will retrigger all CI builds. Maybe this can help you check the robustness of UTs in |
/rerun-all |
@tenzen-y I'm sure that the flaky error has been addressed now. May I ask whether you need to check it again or not? I can trigger the CI builds again by pushing more tiny changes since the coverage report only works when we retrigger all CI builds. |
Thank you for driving this! Throughout this 2 week CI result, I am sure that we succeeded to get rid of flakiness root causes. https://github.com/kubeflow/katib/actions/workflows/test-go.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most things lgtm
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was great improvments!
Thank you for doing this!
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich, tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thank you for your detailed review @tenzen-y! This PR is holding now. Can you remove the |
Sure. |
What this PR does / why we need it:
I made some compatibility changes to the Trial Controller. Design details: https://github.com/kubeflow/katib/blob/master/docs/proposals/push-based-metrics-collection.md#compatibility-changes-in-trial-controller
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Checklist: