Replies: 2 comments
-
I believe the problem comes from the api on streaming mode, closing steaming mode works well.. here is my code snippet, fyi local_llm = ChatOpenAI( |
Beta Was this translation helpful? Give feedback.
-
that's definitely unexpected (the theoretical maximum is 4 tokens per char). there have been several fixes to usage stats since vllm 0.6.0. can you try upgrading vllm? |
Beta Was this translation helpful? Give feedback.
-
Hi, all, I would summary long chinese text extracting from pdf. How ever, when input more then 12000 chars, repetition and hallucination appears.
But the most confusion comes from token length, token length is almost 100 times of char length, I tested on gpt-4o, char length is about 1 times of char length. is there some thing I missed?
per qwen doc,
https://qwen.readthedocs.io/en/latest/getting_started/concepts.html
"As a rule of thumb, 1 token is 3
4 characters for English texts and 1.51.8 characters for Chinese texts."deploy info:
vllm 0.6
Qwen2.5-14B-Instruct
length info:
input_len: 10000
output_len: 535
output from include_usage:
{'input_tokens': 2046816, 'output_tokens': 47585, 'total_tokens': 2094401, 'input_token_details': {}, 'output_token_details': {}}
input_len: 12000
output_len: 678
output from include_usage:
{'input_tokens': 3158311, 'output_tokens': 81002, 'total_tokens': 3239313, 'input_token_details': {}, 'output_token_details': {}}
input_len: 14000
output_len: 42
{'input_tokens': 3396911, 'output_tokens': 69377, 'total_tokens': 3466288, 'input_token_details': {}, 'output_token_details': {}}
Beta Was this translation helpful? Give feedback.
All reactions