Replies: 1 comment
-
Hi dwillie I'm interested in the same feature. I'm working on a document augmented chat-bot where I now integrate REST API server of llama-cpp-python. When I retrieve documents from my vector store I want to fill my context with as much documents as possible. For that, I need to retrieve the number of tokens a certain string needs. For the I implemented a prototype for that:
In this way, the API is still compatible to OpenAI, it is just a superset. Any suggestions to make the endpoint more sophisticated? |
Beta Was this translation helpful? Give feedback.
-
As I understand it, right now the HTTP API doesn't provide an endpoint for tokenising a string.
I was interested in being able to do this because I would like to use tokens with logit biases, but I also want my application to purely interact with the model via the HTTP API (because I'd like to deploy the HTTP API on a more powerful machine while I work).
I considered that I could possibly just tokenize the string on my terminal machine rather than the machine with the model---it looks like the
Llama
class has thetokenize()
method and construction of theLlama
class requires a model path though. Is it possible to tokenize without a model? Do all Llama models use the same tokenizer (sorry for what might be a silly question, still learning)?As I understand it the HTTP API provided is intended to be OpenAI-compatible---I'm not sure if the OpenAI API supports tokenizing but would the team be opposed to making the API a superset of the OpenAI API? I could look into adding a tokenize endpoint if that was considered acceptable and valuable. Perhaps non-OpenAI relevant routes could go under a different root path.
Thanks so much for an awesome project. It's been really great to learn and play with while being able to stay more or less on the cutting edge of the LlamaCpp features.
Apologies if I've overlooked any related PRs/Issues/Discussions already relating to this!
Beta Was this translation helpful? Give feedback.
All reactions