Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add tutorial for semantic search with byte quantized vector and Cohere embedding model #2127

Merged
merged 3 commits into from
Feb 22, 2024

Conversation

ylwu-amzn
Copy link
Collaborator

…e embedding model

Description

[Describe what this change achieves]

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Feb 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (3b162db) 81.86% compared to head (20b79fe) 81.86%.
Report is 11 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2127      +/-   ##
============================================
- Coverage     81.86%   81.86%   -0.01%     
  Complexity     5644     5644              
============================================
  Files           543      543              
  Lines         22790    22800      +10     
  Branches       2333     2333              
============================================
+ Hits          18658    18665       +7     
- Misses         3195     3198       +3     
  Partials        937      937              
Flag Coverage Δ
ml-commons 81.86% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ylwu-amzn ylwu-amzn temporarily deployed to ml-commons-cicd-env February 18, 2024 01:54 — with GitHub Actions Inactive
@ylwu-amzn ylwu-amzn temporarily deployed to ml-commons-cicd-env February 18, 2024 01:54 — with GitHub Actions Inactive
@ylwu-amzn ylwu-amzn temporarily deployed to ml-commons-cicd-env February 18, 2024 01:54 — with GitHub Actions Inactive
Copy link
Contributor

@kolchfa-aws kolchfa-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments.

Co-authored-by: kolchfa-aws <[email protected]>
Signed-off-by: Yaliang Wu <[email protected]>
@ylwu-amzn ylwu-amzn temporarily deployed to ml-commons-cicd-env February 19, 2024 21:13 — with GitHub Actions Inactive
@ylwu-amzn ylwu-amzn temporarily deployed to ml-commons-cicd-env February 19, 2024 21:13 — with GitHub Actions Inactive
@ylwu-amzn ylwu-amzn temporarily deployed to ml-commons-cicd-env February 19, 2024 21:13 — with GitHub Actions Inactive
@ylwu-amzn ylwu-amzn temporarily deployed to ml-commons-cicd-env February 19, 2024 21:14 — with GitHub Actions Inactive
@ylwu-amzn ylwu-amzn temporarily deployed to ml-commons-cicd-env February 19, 2024 21:14 — with GitHub Actions Inactive
@ylwu-amzn ylwu-amzn temporarily deployed to ml-commons-cicd-env February 19, 2024 21:14 — with GitHub Actions Inactive
@ylwu-amzn ylwu-amzn temporarily deployed to ml-commons-cicd-env February 19, 2024 21:41 — with GitHub Actions Inactive
@ylwu-amzn ylwu-amzn temporarily deployed to ml-commons-cicd-env February 19, 2024 21:41 — with GitHub Actions Inactive
@ylwu-amzn ylwu-amzn temporarily deployed to ml-commons-cicd-env February 19, 2024 21:41 — with GitHub Actions Inactive

The Cohere Embed v3 model supports several `embedding_types`. This tutorial uses the `int8` type for byte-quantized vectors.

Note: Replace the placeholders that start with `your_` with your own values.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that the your_xxx placeholders be emphasized somehow in the code samples, e.g., in italic or boldface or something.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will pollute the REST API sample request by doing this. I think we should be good to leave as is.

]
},
{
"name": "sentence_embedding",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth mentioning in the explanation that even though the inference_results.output.data_type says FLOAT32, its not representative of the embeddings defined in the connector (int8)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense

Signed-off-by: Yaliang Wu <[email protected]>
@ylwu-amzn ylwu-amzn temporarily deployed to ml-commons-cicd-env February 22, 2024 02:37 — with GitHub Actions Inactive
@ylwu-amzn ylwu-amzn temporarily deployed to ml-commons-cicd-env February 22, 2024 02:37 — with GitHub Actions Inactive
@ylwu-amzn ylwu-amzn temporarily deployed to ml-commons-cicd-env February 22, 2024 02:37 — with GitHub Actions Inactive
@dhrubo-os dhrubo-os merged commit 7b60989 into opensearch-project:main Feb 22, 2024
13 of 15 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 22, 2024
…e embedding model (#2127)

* add tutorial for semantic search with byte quantized vector and Cohere embedding model

Signed-off-by: Yaliang Wu <[email protected]>

* Apply suggestions from code review

Co-authored-by: kolchfa-aws <[email protected]>
Signed-off-by: Yaliang Wu <[email protected]>

* address comments

Signed-off-by: Yaliang Wu <[email protected]>

---------

Signed-off-by: Yaliang Wu <[email protected]>
Co-authored-by: kolchfa-aws <[email protected]>
(cherry picked from commit 7b60989)
ylwu-amzn added a commit that referenced this pull request Feb 22, 2024
…e embedding model (#2127) (#2149)

* add tutorial for semantic search with byte quantized vector and Cohere embedding model

Signed-off-by: Yaliang Wu <[email protected]>

* Apply suggestions from code review

Co-authored-by: kolchfa-aws <[email protected]>
Signed-off-by: Yaliang Wu <[email protected]>

* address comments

Signed-off-by: Yaliang Wu <[email protected]>

---------

Signed-off-by: Yaliang Wu <[email protected]>
Co-authored-by: kolchfa-aws <[email protected]>
(cherry picked from commit 7b60989)

Co-authored-by: Yaliang Wu <[email protected]>
austintlee pushed a commit to austintlee/ml-commons that referenced this pull request Mar 19, 2024
…e embedding model (opensearch-project#2127)

* add tutorial for semantic search with byte quantized vector and Cohere embedding model

Signed-off-by: Yaliang Wu <[email protected]>

* Apply suggestions from code review

Co-authored-by: kolchfa-aws <[email protected]>
Signed-off-by: Yaliang Wu <[email protected]>

* address comments

Signed-off-by: Yaliang Wu <[email protected]>

---------

Signed-off-by: Yaliang Wu <[email protected]>
Co-authored-by: kolchfa-aws <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants