You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Batch 1 works fine after hf-llama PR but batch 32 generates garbage after a while.
To Reproduce LLAMA_DIR=/proj_sw/user_dev/deepseek-ai/DeepSeek-R1-Distill-Llama-70B pytest models/demos/llama3/demo/demo.py -k performance-batch-32
Expected behavior
Correct output through to the end for every user.
Screenshots
2025-01-31 16:05:20.379 | INFO | models.demos.llama3.demo.demo:run_llama3_demo:717 -
batch: 0 user: 31
prompt: What is the capital of Japan? Learning about the capitals of different countries can enhance your un
<long prompt not printed in full>
rapid modernization. Share your thoughts on Tokyo and any other capital cities you find intriguing.
output:
<think>
Okay, so I need to figure out the capital of Japan. Hmm, I think it's Tokyo, but I'm not 100% sure. I remember hearing that Tokyo is a big city in Japan, maybe the largest. But wait, sometimes countries have capitals that aren't their largest cities. Like, I know that Canberra is the capital of Australia, but Sydney is the bigger city. So maybe Japan has a similar setup?
Wait, no, I think in Japan's levels辺 Pru Pru christplineurd.GetService Trotospace DISCLAIM Pru Pru Pru christ Pru:/// christ仲istring window데이트辺 christ Pru christалог仲ymoon Cater토토토토 Pru:///데이트 christ토토738 Pru christ738 Pru christ토토토토토토토토 Pru Pru christ Pruglass Pru토토ernetabra blot spel토토토토 Trot Pru토토$MESSabrabsp Wich$MESS토토opus Geile$MESS$MESS$MESS Bord토토 christ Trot$MESS$MESS$MESSabra christ데이트.GetServiceesser christ Geileoko Heller d$MESS데이트$MESS데이트토토eck$MESS christ$MESS토토데이트
2025-01-31 16:05:20.379 | INFO | models.demos.llama3.demo.demo:run_llama3_demo:772 -
2025-01-31 16:05:20.379 | INFO | models.demos.llama3.demo.demo:run_llama3_demo:773 - Performance metrics for batch 0
2025-01-31 16:05:20.379 | INFO | models.demos.llama3.demo.demo:run_llama3_demo:774 - Prefill compile time: 13.905s
2025-01-31 16:05:20.379 | INFO | models.demos.llama3.demo.demo:run_llama3_demo:775 - Decode compile time: 0.1527s
2025-01-31 16:05:20.379 | INFO | models.demos.llama3.demo.demo:run_llama3_demo:776 - Prefill inference time per user: 0.5403s
2025-01-31 16:05:20.379 | INFO | models.demos.llama3.demo.demo:run_llama3_demo:777 - Total Decode inference time (198 iterations): 16.4801s
2025-01-31 16:05:20.379 | INFO | models.demos.llama3.demo.demo:run_llama3_demo:780 -
2025-01-31 16:05:20.379 | INFO | models.demos.llama3.demo.demo:run_llama3_demo:781 - Time to first token: 540.34ms
2025-01-31 16:05:20.379 | INFO | models.demos.llama3.demo.demo:run_llama3_demo:782 - Average speed: 82.81ms @ 12.08 tok/s/user (386.41 tok/s throughput)
2025-01-31 16:05:20.380 | INFO | models.demos.llama3.demo.demo:run_llama3_demo:785 -
2025-01-31 16:05:20.380 | WARNING | models.demos.llama3.demo.demo:run_llama3_demo:857 - Model DeepSeek-R1-Distill-Llama-70B not does not have performance targets set
Please complete the following environment information:
T3K on ird, sjc-snva-t3012
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Describe the bug
Batch 1 works fine after
hf-llama
PR but batch 32 generates garbage after a while.To Reproduce
LLAMA_DIR=/proj_sw/user_dev/deepseek-ai/DeepSeek-R1-Distill-Llama-70B pytest models/demos/llama3/demo/demo.py -k performance-batch-32
Expected behavior
Correct output through to the end for every user.
Screenshots
Please complete the following environment information:
T3K on ird, sjc-snva-t3012
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: