-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Deepseek-v3 #63
base: main
Are you sure you want to change the base?
Add Deepseek-v3 #63
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Should probably submit the original model before this PR to easily spot the diff. |
Great to get it running! Few high level changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stamp to unblock feel free to merge after the change
Can we add a unit test to make sure it runs and produces correct results as compared to CPU eager? Example for Llama: https://github.com/AI-Hypercomputer/torchprime/blob/main/torchprime/experimental/torchax_models/test/test_llama.py#L40 Btw, we also need to fix lint. You can do that by running Thanks! |
@miladm looks like only part of the model is converted to jax device. I reverted the code change in model.py and tried to call I can rewrite the rms_norm op to fine grained op as llama model first. @qihqi , BTW, do we apply decomposition before converting torch op to jax ops? |
Goal:
Non-Goal / Next-Steps:
fp8_gmm
)