-
Notifications
You must be signed in to change notification settings - Fork 10
Add tokenizer #394
Add tokenizer #394
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
@@ -924,6 +925,14 @@ async def get_model_config(self) -> ModelConfig: | |||
else: | |||
return self.engine.get_model_config() | |||
|
|||
async def get_parallel_config(self) -> ParallelConfig: | |||
"""Get the parallel configuration of the vLLM engine.""" | |||
if self.engine_use_ray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these if
s are outta control, the ray engine should totally be a separate VLLMBackend
😉
...a change for another day, or week
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
@@ -924,6 +925,14 @@ async def get_model_config(self) -> ModelConfig: | |||
else: | |||
return self.engine.get_model_config() | |||
|
|||
async def get_parallel_config(self) -> ParallelConfig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can these new methods go into the VLLMBackend
protocol as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think they should be in the protocol
because the Protocol does not have to implement these + most of the time the Protocol will not implement these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah I see these are only on the AsyncLLMEngine, 🌶️
@@ -119,6 +119,7 @@ async def test_single_completion(client: openai.AsyncOpenAI, model_name: str, | |||
choice = completion.choices[0] | |||
assert len(choice.text) >= 5 | |||
assert choice.finish_reason == "length" | |||
print(completion.usage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print!
request_id=generate_request.request_id, | ||
lora_request=generate_request.lora_request, | ||
trace_headers=generate_request.trace_headers, | ||
prompt_adapter_request=generate_request.prompt_adapter_request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 yeah this would do it lol
f5f0b45
into
isolate-oai-server-process
SUMMARY:
ModelConfig
,SchedulerConfig
,LoRAConfig
,ParallelConfig