You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem?
In OpenSearch we support some sentence-transformers model as pretrained models. The registration of pretrained models is much more convenient, and users don't need to change the cluster settings plugins.ml_commons.allow_registering_model_via_url.
With the development of the research and engineering evolution in IR domain, now there are much stronger text_embedding models in the open source community. (leaderboard ref) However, users still need to trace these models and generate the tarball manually, which is a heavy workload especially for those with little machine-learning background knowledge.
Considering the models will consume resources in local deployment, We can support bge-small-en-v1.5 and bge-base-en-v1.5 as pretrained models in OpenSearch.
What alternatives have you considered?
A clear and concise description of any alternative solutions or features you've considered.
Is your feature request related to a problem?
In OpenSearch we support some sentence-transformers model as pretrained models. The registration of pretrained models is much more convenient, and users don't need to change the cluster settings plugins.ml_commons.allow_registering_model_via_url.
With the development of the research and engineering evolution in IR domain, now there are much stronger text_embedding models in the open source community. (leaderboard ref) However, users still need to trace these models and generate the tarball manually, which is a heavy workload especially for those with little machine-learning background knowledge.
What solution would you like?
BGE models(https://huggingface.co/BAAI/bge-small-en-v1.5, https://huggingface.co/BAAI/bge-base-en-v1.5, https://huggingface.co/BAAI/bge-large-en-v1.5) have very strong text_embedding representation among the models with same size. And we can use them consistently with other sentence-transformers text_embedding models.
Considering the models will consume resources in local deployment, We can support bge-small-en-v1.5 and bge-base-en-v1.5 as pretrained models in OpenSearch.
What alternatives have you considered?
A clear and concise description of any alternative solutions or features you've considered.
Do you have any additional context?
opensearch-project/ml-commons#2210
The text was updated successfully, but these errors were encountered: