Correct Configuration for Using 64k Context in vllm #1071

NaiveYan · 2024-11-12T01:32:45Z

NaiveYan
Nov 12, 2024

I’m wondering how to properly configure vllm to use a 64k context. Should I adjust the rope_scaling factor to 2.0, or would it be better to keep the default 128k rope_scaling and simply set max_model_len=65536 when launching vllm?
Could the configuration be shared for the tests run on https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct Configuration for Using 64k Context in vllm #1071

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Correct Configuration for Using 64k Context in vllm #1071

NaiveYan Nov 12, 2024

Replies: 0 comments

NaiveYan
Nov 12, 2024