Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Add tokenizer #394

Merged
merged 9 commits into from
Jul 31, 2024
Merged

Conversation

robertgshaw2-redhat
Copy link
Collaborator

SUMMARY:

  • add endpoints to request ModelConfig, SchedulerConfig, LoRAConfig, ParallelConfig
  • factor out tokenizer group creation function to be a utility function
  • create tokenizer_group on client side

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@@ -924,6 +925,14 @@ async def get_model_config(self) -> ModelConfig:
else:
return self.engine.get_model_config()

async def get_parallel_config(self) -> ParallelConfig:
"""Get the parallel configuration of the vLLM engine."""
if self.engine_use_ray:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these ifs are outta control, the ray engine should totally be a separate VLLMBackend 😉

...a change for another day, or week

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

@@ -924,6 +925,14 @@ async def get_model_config(self) -> ModelConfig:
else:
return self.engine.get_model_config()

async def get_parallel_config(self) -> ParallelConfig:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these new methods go into the VLLMBackend protocol as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think they should be in the protocol because the Protocol does not have to implement these + most of the time the Protocol will not implement these

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah I see these are only on the AsyncLLMEngine, 🌶️

@@ -119,6 +119,7 @@ async def test_single_completion(client: openai.AsyncOpenAI, model_name: str,
choice = completion.choices[0]
assert len(choice.text) >= 5
assert choice.finish_reason == "length"
print(completion.usage)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print!

request_id=generate_request.request_id,
lora_request=generate_request.lora_request,
trace_headers=generate_request.trace_headers,
prompt_adapter_request=generate_request.prompt_adapter_request)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 yeah this would do it lol

@robertgshaw2-redhat robertgshaw2-redhat merged commit f5f0b45 into isolate-oai-server-process Jul 31, 2024
1 check passed
@robertgshaw2-redhat robertgshaw2-redhat deleted the add-tokenizer branch July 31, 2024 22:02
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants