You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m wondering how to properly configure vllm to use a 64k context. Should I adjust the rope_scaling factor to 2.0, or would it be better to keep the default 128k rope_scaling and simply set max_model_len=65536 when launching vllm?
Could the configuration be shared for the tests run on https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I’m wondering how to properly configure vllm to use a 64k context. Should I adjust the rope_scaling factor to 2.0, or would it be better to keep the default 128k rope_scaling and simply set max_model_len=65536 when launching vllm?
Could the configuration be shared for the tests run on https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html?
Beta Was this translation helpful? Give feedback.
All reactions