truncate
parameter is ignored with openai endpoint in chat_completions
#1654
Labels
bug
Something isn't working
Bug description
We use local hosted ChatUI. Chat-UI has
truncate
parameter that is ignored when using openai endpoint. We are using vllm to host Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4 model. vllm has limitation of 12000 context length (--max-model-len 12000
). I want ChatUI to truncate entire chat-history to 12000 tokens, but it sends long messagesSteps to reproduce
Expected behaviour: message will be truncated according to
parameters.truncate
configActual behaviour: message is not truncated, vllm reports Bad Request, the user sees error message (screenshot attached)
Screenshots
Context
Logs
Specs
Config
Notes
The problem is in
src\lib\server\endpoints\openai\endpointOai.ts
: it usesbuildPrompt
, which truncates messages according tomodel.parameters?.truncate
, but only forcompletion === "completions"
. Ifcompletion === "chat_completions"
is set, thentruncate
param is ignoredThe text was updated successfully, but these errors were encountered: