Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieve remote model id from registration response in IT to avoid flaky #3244

Conversation

zane-neo
Copy link
Collaborator

Description

Retrieve remote model id from registration response in IT to avoid flaky, an example is:

RestBedRockInferenceIT > test_bedrock_multimodal_model_empty_imageInput_null_textInput FAILED
    org.opensearch.client.ResponseException: method [POST], host [http://127.0.0.1:45675/], URI [/_plugins/_ml/models/null/_deploy], status line [HTTP/1.1 404 Not Found]
    {"error":{"root_cause":[{"type":"status_exception","reason":"Failed to find model"}],"type":"status_exception","reason":"Failed to find model"},"status":404}
        at __randomizedtesting.SeedInfo.seed([7D8B7621F7A9A3F2:CB22567814C8B651]:0)
        at app//org.opensearch.client.RestClient.convertResponse(RestClient.java:501)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:384)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:359)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:182)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:155)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:144)
        at app//org.opensearch.ml.rest.RestMLRemoteInferenceIT.deployRemoteModel(RestMLRemoteInferenceIT.java:1220)
        at app//org.opensearch.ml.rest.MLCommonsRestTestCase.registerRemoteModel(MLCommonsRestTestCase.java:1011)
        at app//org.opensearch.ml.rest.RestBedRockInferenceIT.test_bedrock_multimodal_model_empty_imageInput_null_textInput(RestBedRockInferenceIT.java:216)

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@xinyual xinyual merged commit 1d30671 into opensearch-project:main Dec 3, 2024
9 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Dec 3, 2024
dhrubo-os pushed a commit that referenced this pull request Dec 3, 2024
…aky (#3244) (#3249)

Signed-off-by: zane-neo <[email protected]>
(cherry picked from commit 1d30671)

Co-authored-by: zane-neo <[email protected]>
brianf-aws added a commit to brianf-aws/ml-commons that referenced this pull request Dec 5, 2024
Following opensearch-project#3244 this IT called the task api to check the model id again however this is redundant. Instead one can directly pull the model_id upon creating the model group. Manual testing was done to see that the behavior is intact, this should help reduce the calls within a IT to make it less flaky

Signed-off-by: Brian Flores <[email protected]>
dhrubo-os pushed a commit that referenced this pull request Dec 6, 2024
…MTest (#3253)

* fix uneeded call to get model_id for task api within RestMLGuardrailsIT

Following #3244 this IT called the task api to check the model id again however this is redundant. Instead one can directly pull the model_id upon creating the model group. Manual testing was done to see that the behavior is intact, this should help reduce the calls within a IT to make it less flaky

Signed-off-by: Brian Flores <[email protected]>

* fix ToolIntegrationWithLLMTest model undeploy race condition

Previously the test class attempted to delete a model without fully knowing if the model was undeployed in time. This change adds a waiting for 5 retries each 1 second to check the status of the model and only when undeployed will it proceed to delete the model. When the number of retries are exceeded it throws a error indicating a deeper problem. Manual testing was done to check that the model is undeployed by searching for the specific model via the checkForModelUndeployedStatus method.

Signed-off-by: Brian Flores <[email protected]>

---------

Signed-off-by: Brian Flores <[email protected]>
brianf-aws added a commit to brianf-aws/ml-commons that referenced this pull request Dec 9, 2024
…MTest (opensearch-project#3253)

* fix uneeded call to get model_id for task api within RestMLGuardrailsIT

Following opensearch-project#3244 this IT called the task api to check the model id again however this is redundant. Instead one can directly pull the model_id upon creating the model group. Manual testing was done to see that the behavior is intact, this should help reduce the calls within a IT to make it less flaky

Signed-off-by: Brian Flores <[email protected]>

* fix ToolIntegrationWithLLMTest model undeploy race condition

Previously the test class attempted to delete a model without fully knowing if the model was undeployed in time. This change adds a waiting for 5 retries each 1 second to check the status of the model and only when undeployed will it proceed to delete the model. When the number of retries are exceeded it throws a error indicating a deeper problem. Manual testing was done to check that the model is undeployed by searching for the specific model via the checkForModelUndeployedStatus method.

Signed-off-by: Brian Flores <[email protected]>

---------

Signed-off-by: Brian Flores <[email protected]>
(cherry picked from commit 1a659c8)
brianf-aws added a commit to brianf-aws/ml-commons that referenced this pull request Dec 9, 2024
…MTest (opensearch-project#3253)

* fix uneeded call to get model_id for task api within RestMLGuardrailsIT

Following opensearch-project#3244 this IT called the task api to check the model id again however this is redundant. Instead one can directly pull the model_id upon creating the model group. Manual testing was done to see that the behavior is intact, this should help reduce the calls within a IT to make it less flaky

Signed-off-by: Brian Flores <[email protected]>

* fix ToolIntegrationWithLLMTest model undeploy race condition

Previously the test class attempted to delete a model without fully knowing if the model was undeployed in time. This change adds a waiting for 5 retries each 1 second to check the status of the model and only when undeployed will it proceed to delete the model. When the number of retries are exceeded it throws a error indicating a deeper problem. Manual testing was done to check that the model is undeployed by searching for the specific model via the checkForModelUndeployedStatus method.

Signed-off-by: Brian Flores <[email protected]>

---------

Signed-off-by: Brian Flores <[email protected]>
(cherry picked from commit 1a659c8)
tkykenmt pushed a commit to tkykenmt/ml-commons that referenced this pull request Dec 15, 2024
tkykenmt pushed a commit to tkykenmt/ml-commons that referenced this pull request Dec 15, 2024
…MTest (opensearch-project#3253)

* fix uneeded call to get model_id for task api within RestMLGuardrailsIT

Following opensearch-project#3244 this IT called the task api to check the model id again however this is redundant. Instead one can directly pull the model_id upon creating the model group. Manual testing was done to see that the behavior is intact, this should help reduce the calls within a IT to make it less flaky

Signed-off-by: Brian Flores <[email protected]>

* fix ToolIntegrationWithLLMTest model undeploy race condition

Previously the test class attempted to delete a model without fully knowing if the model was undeployed in time. This change adds a waiting for 5 retries each 1 second to check the status of the model and only when undeployed will it proceed to delete the model. When the number of retries are exceeded it throws a error indicating a deeper problem. Manual testing was done to check that the model is undeployed by searching for the specific model via the checkForModelUndeployedStatus method.

Signed-off-by: Brian Flores <[email protected]>

---------

Signed-off-by: Brian Flores <[email protected]>
tkykenmt pushed a commit to tkykenmt/ml-commons that referenced this pull request Dec 15, 2024
tkykenmt pushed a commit to tkykenmt/ml-commons that referenced this pull request Dec 15, 2024
…MTest (opensearch-project#3253)

* fix uneeded call to get model_id for task api within RestMLGuardrailsIT

Following opensearch-project#3244 this IT called the task api to check the model id again however this is redundant. Instead one can directly pull the model_id upon creating the model group. Manual testing was done to see that the behavior is intact, this should help reduce the calls within a IT to make it less flaky

Signed-off-by: Brian Flores <[email protected]>

* fix ToolIntegrationWithLLMTest model undeploy race condition

Previously the test class attempted to delete a model without fully knowing if the model was undeployed in time. This change adds a waiting for 5 retries each 1 second to check the status of the model and only when undeployed will it proceed to delete the model. When the number of retries are exceeded it throws a error indicating a deeper problem. Manual testing was done to check that the model is undeployed by searching for the specific model via the checkForModelUndeployedStatus method.

Signed-off-by: Brian Flores <[email protected]>

---------

Signed-off-by: Brian Flores <[email protected]>
Signed-off-by: tkykenmt <[email protected]>
dhrubo-os pushed a commit that referenced this pull request Dec 31, 2024
…tegrationWithLLMTest (#3263)

* Fixes Two Flaky IT classes RestMLGuardrailsIT & ToolIntegrationWithLLMTest (#3253)

* fix uneeded call to get model_id for task api within RestMLGuardrailsIT

Following #3244 this IT called the task api to check the model id again however this is redundant. Instead one can directly pull the model_id upon creating the model group. Manual testing was done to see that the behavior is intact, this should help reduce the calls within a IT to make it less flaky

Signed-off-by: Brian Flores <[email protected]>

* fix ToolIntegrationWithLLMTest model undeploy race condition

Previously the test class attempted to delete a model without fully knowing if the model was undeployed in time. This change adds a waiting for 5 retries each 1 second to check the status of the model and only when undeployed will it proceed to delete the model. When the number of retries are exceeded it throws a error indicating a deeper problem. Manual testing was done to check that the model is undeployed by searching for the specific model via the checkForModelUndeployedStatus method.

Signed-off-by: Brian Flores <[email protected]>

---------

Signed-off-by: Brian Flores <[email protected]>
(cherry picked from commit 1a659c8)

* add retry according to how many rest clients are in a IT cluster

Signed-off-by: Brian Flores <[email protected]>

* fix retry initialization

The MAX_RETRIES variable had to wait for the cluster to form before it could call to get the cluster size

Signed-off-by: Brian Flores <[email protected]>

---------

Signed-off-by: Brian Flores <[email protected]>
@mingshl
Copy link
Collaborator

mingshl commented Jan 10, 2025

@pyek-bot @xinyual what versions should this fix go back to?

opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 10, 2025
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 10, 2025
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 10, 2025
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants