Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Deepseek-v3 #63

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 172 additions & 0 deletions torchprime/experimental/torchax_models/DeepSeek-V3/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# UV
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
#uv.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

.vscode/*

.DS_Store
215 changes: 215 additions & 0 deletions torchprime/experimental/torchax_models/DeepSeek-V3/CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
cff-version: 1.2.0
message: "If you use this work, please cite it using the following metadata."
title: "DeepSeek-V3 Technical Report"
authors:
- name: "DeepSeek-AI"
- name: "Aixin Liu"
- name: "Bei Feng"
- name: "Bing Xue"
- name: "Bingxuan Wang"
- name: "Bochao Wu"
- name: "Chengda Lu"
- name: "Chenggang Zhao"
- name: "Chengqi Deng"
- name: "Chenyu Zhang"
- name: "Chong Ruan"
- name: "Damai Dai"
- name: "Daya Guo"
- name: "Dejian Yang"
- name: "Deli Chen"
- name: "Dongjie Ji"
- name: "Erhang Li"
- name: "Fangyun Lin"
- name: "Fucong Dai"
- name: "Fuli Luo"
- name: "Guangbo Hao"
- name: "Guanting Chen"
- name: "Guowei Li"
- name: "H. Zhang"
- name: "Han Bao"
- name: "Hanwei Xu"
- name: "Haocheng Wang"
- name: "Haowei Zhang"
- name: "Honghui Ding"
- name: "Huajian Xin"
- name: "Huazuo Gao"
- name: "Hui Li"
- name: "Hui Qu"
- name: "J. L. Cai"
- name: "Jian Liang"
- name: "Jianzhong Guo"
- name: "Jiaqi Ni"
- name: "Jiashi Li"
- name: "Jiawei Wang"
- name: "Jin Chen"
- name: "Jingchang Chen"
- name: "Jingyang Yuan"
- name: "Junjie Qiu"
- name: "Junlong Li"
- name: "Junxiao Song"
- name: "Kai Dong"
- name: "Kai Hu"
- name: "Kaige Gao"
- name: "Kang Guan"
- name: "Kexin Huang"
- name: "Kuai Yu"
- name: "Lean Wang"
- name: "Lecong Zhang"
- name: "Lei Xu"
- name: "Leyi Xia"
- name: "Liang Zhao"
- name: "Litong Wang"
- name: "Liyue Zhang"
- name: "Meng Li"
- name: "Miaojun Wang"
- name: "Mingchuan Zhang"
- name: "Minghua Zhang"
- name: "Minghui Tang"
- name: "Mingming Li"
- name: "Ning Tian"
- name: "Panpan Huang"
- name: "Peiyi Wang"
- name: "Peng Zhang"
- name: "Qiancheng Wang"
- name: "Qihao Zhu"
- name: "Qinyu Chen"
- name: "Qiushi Du"
- name: "R. J. Chen"
- name: "R. L. Jin"
- name: "Ruiqi Ge"
- name: "Ruisong Zhang"
- name: "Ruizhe Pan"
- name: "Runji Wang"
- name: "Runxin Xu"
- name: "Ruoyu Zhang"
- name: "Ruyi Chen"
- name: "S. S. Li"
- name: "Shanghao Lu"
- name: "Shangyan Zhou"
- name: "Shanhuang Chen"
- name: "Shaoqing Wu"
- name: "Shengfeng Ye"
- name: "Shirong Ma"
- name: "Shiyu Wang"
- name: "Shuang Zhou"
- name: "Shuiping Yu"
- name: "Shunfeng Zhou"
- name: "Shuting Pan"
- name: "T. Wang"
- name: "Tao Yun"
- name: "Tian Pei"
- name: "Tianyu Sun"
- name: "W. L. Xiao"
- name: "Wangding Zeng"
- name: "Wanjia Zhao"
- name: "Wei An"
- name: "Wen Liu"
- name: "Wenfeng Liang"
- name: "Wenjun Gao"
- name: "Wenqin Yu"
- name: "Wentao Zhang"
- name: "X. Q. Li"
- name: "Xiangyue Jin"
- name: "Xianzu Wang"
- name: "Xiao Bi"
- name: "Xiaodong Liu"
- name: "Xiaohan Wang"
- name: "Xiaojin Shen"
- name: "Xiaokang Chen"
- name: "Xiaokang Zhang"
- name: "Xiaosha Chen"
- name: "Xiaotao Nie"
- name: "Xiaowen Sun"
- name: "Xiaoxiang Wang"
- name: "Xin Cheng"
- name: "Xin Liu"
- name: "Xin Xie"
- name: "Xingchao Liu"
- name: "Xingkai Yu"
- name: "Xinnan Song"
- name: "Xinxia Shan"
- name: "Xinyi Zhou"
- name: "Xinyu Yang"
- name: "Xinyuan Li"
- name: "Xuecheng Su"
- name: "Xuheng Lin"
- name: "Y. K. Li"
- name: "Y. Q. Wang"
- name: "Y. X. Wei"
- name: "Y. X. Zhu"
- name: "Yang Zhang"
- name: "Yanhong Xu"
- name: "Yanping Huang"
- name: "Yao Li"
- name: "Yao Zhao"
- name: "Yaofeng Sun"
- name: "Yaohui Li"
- name: "Yaohui Wang"
- name: "Yi Yu"
- name: "Yi Zheng"
- name: "Yichao Zhang"
- name: "Yifan Shi"
- name: "Yiliang Xiong"
- name: "Ying He"
- name: "Ying Tang"
- name: "Yishi Piao"
- name: "Yisong Wang"
- name: "Yixuan Tan"
- name: "Yiyang Ma"
- name: "Yiyuan Liu"
- name: "Yongqiang Guo"
- name: "Yu Wu"
- name: "Yuan Ou"
- name: "Yuchen Zhu"
- name: "Yuduan Wang"
- name: "Yue Gong"
- name: "Yuheng Zou"
- name: "Yujia He"
- name: "Yukun Zha"
- name: "Yunfan Xiong"
- name: "Yunxian Ma"
- name: "Yuting Yan"
- name: "Yuxiang Luo"
- name: "Yuxiang You"
- name: "Yuxuan Liu"
- name: "Yuyang Zhou"
- name: "Z. F. Wu"
- name: "Z. Z. Ren"
- name: "Zehui Ren"
- name: "Zhangli Sha"
- name: "Zhe Fu"
- name: "Zhean Xu"
- name: "Zhen Huang"
- name: "Zhen Zhang"
- name: "Zhenda Xie"
- name: "Zhengyan Zhang"
- name: "Zhewen Hao"
- name: "Zhibin Gou"
- name: "Zhicheng Ma"
- name: "Zhigang Yan"
- name: "Zhihong Shao"
- name: "Zhipeng Xu"
- name: "Zhiyu Wu"
- name: "Zhongyu Zhang"
- name: "Zhuoshu Li"
- name: "Zihui Gu"
- name: "Zijia Zhu"
- name: "Zijun Liu"
- name: "Zilin Li"
- name: "Ziwei Xie"
- name: "Ziyang Song"
- name: "Ziyi Gao"
- name: "Zizheng Pan"
year: 2024
identifiers:
- type: doi
value: 10.48550/arXiv.2412.19437
- type: arXiv
value: 2412.19437
url: "https://arxiv.org/abs/2412.19437"
categories:
- "cs.CL"
repository-code: "https://github.com/deepseek-ai/DeepSeek-V3"
license: "MIT"
abstract: >
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks.
Binary file not shown.
Loading
Loading