-
-
Notifications
You must be signed in to change notification settings - Fork 79
Support for v1/embeddings
endpoint
#179
Comments
Basaran's current goal is to ensure compatibility with both the text completion and chat completion APIs, which actually share the same model. To support embeddings, we would need another (or a set of) model(s), which is actually how OpenAI API does it. From a architecture perspective, a GPT-like decoder-only is not the best choice for obtaining embeddings. I suggest using options like SentenceTransformers or Universal Sentence Encoder. Therefore, users will need to deploy additional services regardless, so perhaps the embeddings feature is indeed beyond the scope of the Basaran project? |
Here is the thing, from an end user perspective for this to be a drop in replacement for OpenAI API for it to be considered 'functional', it needs to replicate all main features of the API. I understand that the goal is to only replicate the completion part of the API, however for the vast majority of implementations it will simply make the project useless and require the end user to implement additional measures. What many (most?) people want is to be able to load up this docker image, and then redirect any application that currently uses OpenAI api and have it 'just work'. I understand it may be outside the scope of the project but I honestly hope you will consider expanding the scope to cover the entire API. Without expanding the scope, the use cases are extremely limited. Please do not take this input the wrong way, I certainly appreciate the hard work that people have been doing on this project and hope it continues to thrive. |
Yep, of course.
Yes, this is exactly the use case in OpenCharacters. We need a "one-stop-shop" because the UX for directing users away from the closed ecosystem would otherwise be a 10 step process instead of just "install docker and run this command". It needs to be that simple, and Basaran has been awesome for this so far. It seems like Basaran has enough momentum that it could be the open-source ML server. Also, I just want to emphasise that the Hugging Face repo based approach is excellent and should be maintained for any other APIs (embedding, text-to-image, etc.) |
@Electrofried @josephrocca I completely understand and agree with your point of view! The current difficulty actually comes from the architecture: supporting embeddings requires deploying additional models, which are much smaller than LLM but still require significant resources. As a result, Basaran may no longer be a simple docker image, but rather multiple services behind a router, with increased deployment complexity. For example:
|
We plan to focus on achieving compatibility with the chat API in the short term, and in the long term, we may start a router project that provides a complete replacement for the OpenAI API, where Basaran is one of the backend. This will also enable model selection using the This could be the beginning of a whole new ecosystem! |
This sounds perfect! I was actually going to open a separate issue about allowing multiple LLM models behind the same API (i.e. with a single Docker command, but with multiple (Maybe even an option for lazily-loaded models so you don't specify Either way, I'm very excited for this - it would be awesome if the open source ML ecosystem had a full, drop-in replacement for OpenAI APIs. |
@peakji Actually, this might be a really important feature. I can imagine a cloud service (perhaps run by the Basaran project itself as an open-core startup) where I can just tell my OpenCharacters users to swap the api.openai.com URL in their settings for api.basaran.com and then everything works exactly the same, and you just specify the huggingface user/repo as the |
+1 for /embeddings :) |
Related project: https://github.com/closedai-project/closedai They're also intending to add embeddings, image generation, etc. but currently they only support completion and chat completion. Might be room for some collaboration here |
How about the following lines added here def get_embeddings(self, input_ids):
if input_ids.ndim == 1:
input_ids = input_ids[None, :]
outs = self.model.base_model.forward(input_ids, return_dict=False)
features = outs[0].float()
return features.mean(dim=1)[0] # force batch 1 reference to https://github.com/UKPLab/sentence-transformers/blob/214498f/sentence_transformers/SentenceTransformer.py#L809 |
@hewr1993 It may not be a good idea to use a text generation model to obtain embeddings. I will explain in detail below. |
We have decided not to add support for embeddings in Basaran: Currently, Basaran is designed to serve only one model per process. Considering the capabilities of commodity hardware and the size of the latest models, we believe this is a reasonable design. The primary goal of Basaran is to provide text completion capabilities (and soon-to-be-added chat completion), which generally require decoder-only or encoder-decoder architecture. However, the best practice for embedding models is to use a Transformer encoder. Therefore, it is challenging to reuse the same model to support both completion and embedding. In fact, the OpenAI embedding model, sometimes referred as GPT-3 embedding, is actually a separate encoder model initialized with the weights of GPT or Codex[1]. In addition to the limitations imposed by the model structure, we currently cannot achieve full compatibility with OpenAI's embedding API using open source models: OpenAI's latest model, text-embedding-ada-002, has successfully replaced the four previous models used for different scenarios, thereby providing a simple unified embedding API. However, the models in the open-source community are currently not as versatile. They either require different models for symmetric or asymmetric tasks or need specific instructions to adapt to different domains[2]. Therefore, considering the engineering and research limitations, we have decided not to add support for embeddings in Basaran. In the future, we may initiate a new project that acts as a router to fully support all OpenAI APIs. Multiple Basaran instances (or other inference services) can be mounted to achieve load balancing and model selection at that time. References:
|
I'm not sure how feasible or within-scope this is, but it'd be very useful if the Basaran project were able to implement the
v1/embeddings
endpoint (using Hugging Face repos, like with thev1/completions
endpoint).Text embeddings are very often used alongside the completion endpoints, and we have this particular requirement for OpenCharacters so we can save and search over the character's "memories".
(And very soon we'll have the same requirement for text-to-image. If it were possible for Basaran to aim to be the OpenAI-compatible, open-source API server, that would be awesome.)
The text was updated successfully, but these errors were encountered: