-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
assert_packing_loss.py Invalid for deepseek-v2-lite #266
Comments
Hi @bao-xiaoyi, can you send me your command you ran |
python assert_packing_loss.py /kas/kas_workspace/open_llm/DeepSeek-Coder-V2-Lite-Instruct |
Additionally, when I use Starcoderv2 for testing, there are also errors reported: |
When I use starcoderv2, original_token_count = 147277, And mk_token_count=4014 |
Hi @bao-xiaoyi, I think the reason for this error is because for this model, it uses the remote code (I mean, it is using modeling_deepseek.py). So you can do as follows:
About assert_packing_loss.py you can change as follows:
|
@bao-xiaoyi for starcoder, which base_model you used, I tested following command and it works:
|
I chose the 15b model, and the average loss is a bit large |
I don't quite understand why local code should be used when using packing, and remote code can be used when not packing? |
Moreover, the comparison of time consumption does not seem as exaggerated as shown in the readme. I tested Deepseek using the code you modified, and the time comparison is 18.712671 vs 7.400667 or 9.163215 vs 6.737796 |
@bao-xiaoyi I think directly monkey-patching remote code ( |
By the way, I have just run:
|
Can you provide the time comparison results of your testing on Deepseek? Thank you very much |
In running this: |
RuntimeError: CUDA error: an illegal memory access was encountered
Looking forward to the expert's answer
The text was updated successfully, but these errors were encountered: