Skip to content

Commit

Permalink
change to float
Browse files Browse the repository at this point in the history
Signed-off-by: pandyamarut <[email protected]>
  • Loading branch information
pandyamarut committed Aug 9, 2024
1 parent 9cb9336 commit 967eaba
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Deploy OpenAI-Compatible Blazing-Fast LLM Endpoints powered by the [vLLM](https:
### 1. UI for Deploying vLLM Worker on RunPod console:
![Demo of Deploying vLLM Worker on RunPod console with new UI](media/ui_demo.gif)

### 2. Worker vLLM `v1.1` with vLLM `0.5.3` now available under `stable` tags
### 2. Worker vLLM `v1.2.0` with vLLM `0.5.4` now available under `stable` tags
Update v1.1 is now available, use the image tag `runpod/worker-v1-vllm:stable-cuda12.1.0`.

### 3. OpenAI-Compatible [Embedding Worker](https://github.com/runpod-workers/worker-infinity-embedding) Released
Expand Down
2 changes: 1 addition & 1 deletion src/engine_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
DEFAULT_ARGS = {
"disable_log_stats": os.getenv('DISABLE_LOG_STATS', 'False').lower() == 'true',
"disable_log_requests": os.getenv('DISABLE_LOG_REQUESTS', 'False').lower() == 'true',
"gpu_memory_utilization": int(os.getenv('GPU_MEMORY_UTILIZATION', 0.9)),
"gpu_memory_utilization": float(os.getenv('GPU_MEMORY_UTILIZATION', 0.95)),
"pipeline_parallel_size": int(os.getenv('PIPELINE_PARALLEL_SIZE', 1)),
"tensor_parallel_size": int(os.getenv('TENSOR_PARALLEL_SIZE', 1)),
"served_model_name": os.getenv('SERVED_MODEL_NAME', None),
Expand Down

0 comments on commit 967eaba

Please sign in to comment.