Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HF models generate bad output with batch 32 #17443

Open
yieldthought opened this issue Jan 31, 2025 · 0 comments
Open

HF models generate bad output with batch 32 #17443

yieldthought opened this issue Jan 31, 2025 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@yieldthought
Copy link
Contributor

Describe the bug
Batch 1 works fine after hf-llama PR but batch 32 generates garbage after a while.

To Reproduce
LLAMA_DIR=/proj_sw/user_dev/deepseek-ai/DeepSeek-R1-Distill-Llama-70B pytest models/demos/llama3/demo/demo.py -k performance-batch-32

Expected behavior
Correct output through to the end for every user.

Screenshots

2025-01-31 16:05:20.379 | INFO     | models.demos.llama3.demo.demo:run_llama3_demo:717 -
batch: 0 user: 31
prompt: What is the capital of Japan? Learning about the capitals of different countries can enhance your un
<long prompt not printed in full>
 rapid modernization. Share your thoughts on Tokyo and any other capital cities you find intriguing.
output:
<think>
Okay, so I need to figure out the capital of Japan. Hmm, I think it's Tokyo, but I'm not 100% sure. I remember hearing that Tokyo is a big city in Japan, maybe the largest. But wait, sometimes countries have capitals that aren't their largest cities. Like, I know that Canberra is the capital of Australia, but Sydney is the bigger city. So maybe Japan has a similar setup?

Wait, no, I think in Japan's levels辺 Pru Pru christplineurd.GetService Trotospace DISCLAIM Pru Pru Pru christ Pru:/// christ仲istring window데이트辺 christ Pru christалог仲ymoon Cater토토토토 Pru:///데이트 christ토토738 Pru christ738 Pru christ토토토토토토토토 Pru Pru christ Pruglass Pru토토ernetabra blot spel토토토토 Trot Pru토토$MESSabrabsp Wich$MESS토토opus Geile$MESS$MESS$MESS Bord토토 christ Trot$MESS$MESS$MESSabra christ데이트.GetServiceesser christ Geileoko Heller d$MESS데이트$MESS데이트토토eck$MESS christ$MESS토토데이트

2025-01-31 16:05:20.379 | INFO     | models.demos.llama3.demo.demo:run_llama3_demo:772 -
2025-01-31 16:05:20.379 | INFO     | models.demos.llama3.demo.demo:run_llama3_demo:773 - Performance metrics for batch 0
2025-01-31 16:05:20.379 | INFO     | models.demos.llama3.demo.demo:run_llama3_demo:774 - Prefill compile time: 13.905s
2025-01-31 16:05:20.379 | INFO     | models.demos.llama3.demo.demo:run_llama3_demo:775 - Decode compile time: 0.1527s
2025-01-31 16:05:20.379 | INFO     | models.demos.llama3.demo.demo:run_llama3_demo:776 - Prefill inference time per user: 0.5403s
2025-01-31 16:05:20.379 | INFO     | models.demos.llama3.demo.demo:run_llama3_demo:777 - Total Decode inference time (198 iterations): 16.4801s
2025-01-31 16:05:20.379 | INFO     | models.demos.llama3.demo.demo:run_llama3_demo:780 -
2025-01-31 16:05:20.379 | INFO     | models.demos.llama3.demo.demo:run_llama3_demo:781 - Time to first token: 540.34ms
2025-01-31 16:05:20.379 | INFO     | models.demos.llama3.demo.demo:run_llama3_demo:782 - Average speed: 82.81ms @ 12.08 tok/s/user (386.41 tok/s throughput)
2025-01-31 16:05:20.380 | INFO     | models.demos.llama3.demo.demo:run_llama3_demo:785 -
2025-01-31 16:05:20.380 | WARNING  | models.demos.llama3.demo.demo:run_llama3_demo:857 - Model DeepSeek-R1-Distill-Llama-70B not does not have performance targets set

Please complete the following environment information:
T3K on ird, sjc-snva-t3012

Additional context
Add any other context about the problem here.

@yieldthought yieldthought added the bug Something isn't working label Jan 31, 2025
@yieldthought yieldthought self-assigned this Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant