Add Deepseek-v3 #63

miladm · 2025-01-30T09:57:03Z

Goal:

Add Deepseek-v3
Enable single-chip TPU functionality
Use TorchAx

Non-Goal / Next-Steps:

Real input tensor
Real weights
FP8 quantization kernels enablement (fp8_gmm)
Distrbuted

google-cla · 2025-01-30T09:57:08Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

miladm · 2025-01-30T10:01:03Z

Should probably submit the original model before this PR to easily spot the diff.

qihqi · 2025-01-30T19:00:42Z

Great to get it running! Few high level changes:

let's remove all the README.md etc along with their perf graphs, pdfs etc. Instead, in the .py files that we forked, add the URL to their github repo of where we forked from.
Make a new README.md with the command you run to run it on TPU
edit requirements.txt to TPU requirements.

qihqi

stamp to unblock feel free to merge after the change

tengyifei · 2025-01-30T21:46:05Z

Can we add a unit test to make sure it runs and produces correct results as compared to CPU eager? Example for Llama: https://github.com/AI-Hypercomputer/torchprime/blob/main/torchprime/experimental/torchax_models/test/test_llama.py#L40

Btw, we also need to fix lint. You can do that by running ruff check and ruff format.

Thanks!

yaochengji · 2025-01-30T23:10:05Z

@miladm looks like only part of the model is converted to jax device.

I reverted the code change in model.py and tried to call model.to("jax") outside. I got the error: torchax.tensor.OperatorNotFound: Operator with name aten::rms_norm has no lowering.

I can rewrite the rms_norm op to fine grained op as llama model first. @qihqi , BTW, do we apply decomposition before converting torch op to jax ops?

root and others added 4 commits January 30, 2025 04:28

adding the original DeepSeek-v3 model

82a5de9

moving under torchax

a4df7a4

adding original deepseek code

b671bee

added a simple test + single chip tpu support via torchax

6392b25

miladm self-assigned this Jan 30, 2025

miladm requested review from qihqi and yaochengji January 30, 2025 09:58

qihqi approved these changes Jan 30, 2025

View reviewed changes

remove hardcode path of tokenizer and rename torch_xla2 to torchax

610de0d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Deepseek-v3 #63

Add Deepseek-v3 #63

miladm commented Jan 30, 2025 •

edited

Loading

google-cla bot commented Jan 30, 2025

miladm commented Jan 30, 2025

qihqi commented Jan 30, 2025

qihqi left a comment

tengyifei commented Jan 30, 2025 •

edited

Loading

yaochengji commented Jan 30, 2025

Add Deepseek-v3 #63

Are you sure you want to change the base?

Add Deepseek-v3 #63

Conversation

miladm commented Jan 30, 2025 • edited Loading

google-cla bot commented Jan 30, 2025

miladm commented Jan 30, 2025

qihqi commented Jan 30, 2025

qihqi left a comment

Choose a reason for hiding this comment

tengyifei commented Jan 30, 2025 • edited Loading

yaochengji commented Jan 30, 2025

miladm commented Jan 30, 2025 •

edited

Loading

tengyifei commented Jan 30, 2025 •

edited

Loading