Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

truncate parameter is ignored with openai endpoint in chat_completions #1654

Open
ishatalkin opened this issue Jan 17, 2025 · 0 comments
Open
Labels
bug Something isn't working

Comments

@ishatalkin
Copy link

Bug description

We use local hosted ChatUI. Chat-UI has truncate parameter that is ignored when using openai endpoint. We are using vllm to host Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4 model. vllm has limitation of 12000 context length (--max-model-len 12000). I want ChatUI to truncate entire chat-history to 12000 tokens, but it sends long messages

Steps to reproduce

  1. Run chat-ui with .env in config
  2. Create new chat and send some long promt, e.g. Tolstoy War and Peace

Expected behaviour: message will be truncated according to parameters.truncate config
Actual behaviour: message is not truncated, vllm reports Bad Request, the user sees error message (screenshot attached)

Screenshots

Image

Context

Logs

{"level":20,"time":1737128439457,"pid":22,"hostname":"ai","locals":{},"url":"/conversation/678a79f72799fb30aa1649a8","params":{"id":"678a79f72799fb30aa1649a8"},"request":{}}
{"level":50,"time":1737128439545,"pid":22,"hostname":"ai","err":{"type":"BadRequestError","message":"400 status code (no body)","stack":"Error: 400 status code (no body)\n    at APIError.generate (file:///app/build/server/chunks/index-D9Zeknfx.js:1465:20)\n    at OpenAI.makeStatusError (file:///app/build/server/chunks/index-D9Zeknfx.js:919:25)\n    at OpenAI.makeRequest (file:///app/build/server/chunks/index-D9Zeknfx.js:962:30)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async file:///app/build/server/chunks/models-D_zYZ0VB.js:4339:36\n    at async generate (file:///app/build/server/chunks/_server.ts-CYoczJsh.js:457:30)\n    at async textGenerationWithoutTitle (file:///app/build/server/chunks/_server.ts-CYoczJsh.js:529:3)","status":400,"headers":{"content-length":"273","content-type":"application/json","date":"Fri, 17 Jan 2025 15:40:38 GMT","server":"uvicorn"}},"msg":"400 status code (no body)"}
{"level":50,"time":1737128439546,"pid":22,"hostname":"ai","err":{"type":"BadRequestError","message":"400 status code (no body)","stack":"Error: 400 status code (no body)\n    at APIError.generate (file:///app/build/server/chunks/index-D9Zeknfx.js:1465:20)\n    at OpenAI.makeStatusError (file:///app/build/server/chunks/index-D9Zeknfx.js:919:25)\n    at OpenAI.makeRequest (file:///app/build/server/chunks/index-D9Zeknfx.js:962:30)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async file:///app/build/server/chunks/models-D_zYZ0VB.js:4339:36\n    at async generateFromDefaultEndpoint (file:///app/build/server/chunks/index3-DO-DlP2V.js:1056:23)\n    at async generateTitle (file:///app/build/server/chunks/_server.ts-CYoczJsh.js:216:10)\n    at async generateTitleForConversation (file:///app/build/server/chunks/_server.ts-CYoczJsh.js:180:19)","status":400,"headers":{"content-length":"271","content-type":"application/json","date":"Fri, 17 Jan 2025 15:40:38 GMT","server":"uvicorn"}},"msg":"400 status code (no body)"}
{"level":20,"time":1737128439570,"pid":22,"hostname":"ai","locals":{},"url":"/conversation/678a79f72799fb30aa1649a8","params":{"id":"678a79f72799fb30aa1649a8"},"request":{}}

Specs

  • OS: Linux
  • Browser: Chrome
  • chat-ui commit: v.0.9.4

Config

MODELS=`[
  {
    "name": "Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4",
    "tokenizer": "Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4",
    "preprompt": "",
    "chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
    "parameters": {
      "stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
      "temperature": 0.7,
      "truncate": 3072,
      "max_new_tokens": 1024
    },
    "endpoints": [{
      "type" : "openai",
      "baseURL": "http://127.0.0.1:8000/v1"
    }],
  },
]`

Notes

The problem is in src\lib\server\endpoints\openai\endpointOai.ts: it uses buildPrompt, which truncates messages according to model.parameters?.truncate, but only for completion === "completions". If completion === "chat_completions" is set, then truncate param is ignored

@ishatalkin ishatalkin added the bug Something isn't working label Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant