`truncate` parameter is ignored with openai endpoint in `chat_completions` #1654

ishatalkin · 2025-01-17T15:48:35Z

Bug description

We use local hosted ChatUI. Chat-UI has truncate parameter that is ignored when using openai endpoint. We are using vllm to host Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4 model. vllm has limitation of 12000 context length (--max-model-len 12000). I want ChatUI to truncate entire chat-history to 12000 tokens, but it sends long messages

Steps to reproduce

Run chat-ui with .env in config
Create new chat and send some long promt, e.g. Tolstoy War and Peace

Expected behaviour: message will be truncated according to parameters.truncate config
Actual behaviour: message is not truncated, vllm reports Bad Request, the user sees error message (screenshot attached)

Screenshots

Context

Logs

{"level":20,"time":1737128439457,"pid":22,"hostname":"ai","locals":{},"url":"/conversation/678a79f72799fb30aa1649a8","params":{"id":"678a79f72799fb30aa1649a8"},"request":{}}
{"level":50,"time":1737128439545,"pid":22,"hostname":"ai","err":{"type":"BadRequestError","message":"400 status code (no body)","stack":"Error: 400 status code (no body)\n    at APIError.generate (file:///app/build/server/chunks/index-D9Zeknfx.js:1465:20)\n    at OpenAI.makeStatusError (file:///app/build/server/chunks/index-D9Zeknfx.js:919:25)\n    at OpenAI.makeRequest (file:///app/build/server/chunks/index-D9Zeknfx.js:962:30)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async file:///app/build/server/chunks/models-D_zYZ0VB.js:4339:36\n    at async generate (file:///app/build/server/chunks/_server.ts-CYoczJsh.js:457:30)\n    at async textGenerationWithoutTitle (file:///app/build/server/chunks/_server.ts-CYoczJsh.js:529:3)","status":400,"headers":{"content-length":"273","content-type":"application/json","date":"Fri, 17 Jan 2025 15:40:38 GMT","server":"uvicorn"}},"msg":"400 status code (no body)"}
{"level":50,"time":1737128439546,"pid":22,"hostname":"ai","err":{"type":"BadRequestError","message":"400 status code (no body)","stack":"Error: 400 status code (no body)\n    at APIError.generate (file:///app/build/server/chunks/index-D9Zeknfx.js:1465:20)\n    at OpenAI.makeStatusError (file:///app/build/server/chunks/index-D9Zeknfx.js:919:25)\n    at OpenAI.makeRequest (file:///app/build/server/chunks/index-D9Zeknfx.js:962:30)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async file:///app/build/server/chunks/models-D_zYZ0VB.js:4339:36\n    at async generateFromDefaultEndpoint (file:///app/build/server/chunks/index3-DO-DlP2V.js:1056:23)\n    at async generateTitle (file:///app/build/server/chunks/_server.ts-CYoczJsh.js:216:10)\n    at async generateTitleForConversation (file:///app/build/server/chunks/_server.ts-CYoczJsh.js:180:19)","status":400,"headers":{"content-length":"271","content-type":"application/json","date":"Fri, 17 Jan 2025 15:40:38 GMT","server":"uvicorn"}},"msg":"400 status code (no body)"}
{"level":20,"time":1737128439570,"pid":22,"hostname":"ai","locals":{},"url":"/conversation/678a79f72799fb30aa1649a8","params":{"id":"678a79f72799fb30aa1649a8"},"request":{}}

Specs

OS: Linux
Browser: Chrome
chat-ui commit: v.0.9.4

Config

MODELS=`[
  {
    "name": "Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4",
    "tokenizer": "Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4",
    "preprompt": "",
    "chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
    "parameters": {
      "stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
      "temperature": 0.7,
      "truncate": 3072,
      "max_new_tokens": 1024
    },
    "endpoints": [{
      "type" : "openai",
      "baseURL": "http://127.0.0.1:8000/v1"
    }],
  },
]`

Notes

The problem is in src\lib\server\endpoints\openai\endpointOai.ts: it uses buildPrompt, which truncates messages according to model.parameters?.truncate, but only for completion === "completions". If completion === "chat_completions" is set, then truncate param is ignored

The text was updated successfully, but these errors were encountered:

ishatalkin added the bug Something isn't working label Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`truncate` parameter is ignored with openai endpoint in `chat_completions` #1654

`truncate` parameter is ignored with openai endpoint in `chat_completions` #1654

ishatalkin commented Jan 17, 2025

truncate parameter is ignored with openai endpoint in chat_completions #1654

truncate parameter is ignored with openai endpoint in chat_completions #1654

Comments

ishatalkin commented Jan 17, 2025

Bug description

Steps to reproduce

Screenshots

Context

Logs

Specs

Config

Notes

`truncate` parameter is ignored with openai endpoint in `chat_completions` #1654

`truncate` parameter is ignored with openai endpoint in `chat_completions` #1654