-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix model not deploy issue under intensive prediction tasks #1903
Merged
zane-neo
merged 1 commit into
opensearch-project:main
from
zane-neo:fix-model-not-deploy
Jan 26, 2024
Merged
Fix model not deploy issue under intensive prediction tasks #1903
zane-neo
merged 1 commit into
opensearch-project:main
from
zane-neo:fix-model-not-deploy
Jan 26, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: zane-neo <[email protected]>
zane-neo
requested review from
b4sjoo,
dhrubo-os,
jngz-es,
model-collapse,
rbhavna,
ylwu-amzn,
Zhangxunmt,
austintlee and
HenryL27
as code owners
January 23, 2024 06:39
zane-neo
temporarily deployed
to
ml-commons-cicd-env
January 23, 2024 06:39 — with
GitHub Actions
Inactive
zane-neo
temporarily deployed
to
ml-commons-cicd-env
January 23, 2024 06:39 — with
GitHub Actions
Inactive
zane-neo
had a problem deploying
to
ml-commons-cicd-env
January 23, 2024 06:39 — with
GitHub Actions
Failure
zane-neo
temporarily deployed
to
ml-commons-cicd-env
January 23, 2024 06:40 — with
GitHub Actions
Inactive
zane-neo
temporarily deployed
to
ml-commons-cicd-env
January 23, 2024 06:40 — with
GitHub Actions
Inactive
zane-neo
had a problem deploying
to
ml-commons-cicd-env
January 23, 2024 06:40 — with
GitHub Actions
Failure
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #1903 +/- ##
============================================
+ Coverage 82.61% 82.63% +0.01%
- Complexity 5383 5388 +5
============================================
Files 521 521
Lines 21715 21727 +12
Branches 2210 2212 +2
============================================
+ Hits 17940 17954 +14
+ Misses 2878 2872 -6
- Partials 897 901 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
ylwu-amzn
approved these changes
Jan 26, 2024
rbhavna
approved these changes
Jan 26, 2024
opensearch-trigger-bot bot
pushed a commit
that referenced
this pull request
Jan 26, 2024
Signed-off-by: zane-neo <[email protected]> (cherry picked from commit 521b880)
ylwu-amzn
pushed a commit
that referenced
this pull request
Jan 26, 2024
…1930) Signed-off-by: zane-neo <[email protected]> (cherry picked from commit 521b880) Co-authored-by: zane-neo <[email protected]>
5 tasks
austintlee
pushed a commit
to austintlee/ml-commons
that referenced
this pull request
Mar 19, 2024
…rch-project#1903) Signed-off-by: zane-neo <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Under intensive prediction tasks, the model not deployed error can happen occasionally like below:
The reason is the SyncUpJob cleans timed out deploying model tasks from all tasks not filtering prediction tasks, and a prediction task can be removed from cache when prediction is done. And the corresponding taskCache is cleared, the syncUpJob will encounter NPE and return error response for gatherInfoRequest, and in this case, SyncUpJob sends clearRoutingTable request and clear all routingTable info, the next prediction task will see no model in the cache and throw model needs deploy exception.
Issues Resolved
NA
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.