-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: custom [megatron] nvidia dmc loader #39
base: main
Are you sure you want to change the base?
feat: custom [megatron] nvidia dmc loader #39
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GangGreenTemperTatum can you re run with the latest version of Dyana? I don't see the GPU stages in your report ... can you attach the trace.json?
my bad! admittedly i fell behind on eta for this and may have missed some updates from the time i initially cloned and started working on the loader. made some additional fixes in this commit: Environment Configuration
Process Management
latest trace / cli: Click to expandTotal reclaimed space: 20.91GB
🐳 loader: initializing loader megatron
Step 1/43 : FROM nvcr.io/nvidia/pytorch:24.04-py3
---> 3f0b23af1f4f
Step 2/43 : WORKDIR /app
---> Running in 5846641047fe
---> Removed intermediate container 5846641047fe
---> 80b2ec5ee433
Step 3/43 : RUN apt-get update && apt-get install -y --no-install-recommends git ca-certificates
build-essential && rm -rf /var/lib/apt/lists/*
---> Running in e894b4f30bcd
Get:1 http://archive.ubuntu.com/ubuntu jammy InRelease [270 kB]
Get:2 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:4 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Get:5 http://archive.ubuntu.com/ubuntu jammy/multiverse amd64 Packages [266 kB]
Get:6 http://archive.ubuntu.com/ubuntu jammy/universe amd64 Packages [17.5 MB]
Get:7 http://archive.ubuntu.com/ubuntu jammy/restricted amd64 Packages [164 kB]
Get:8 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages [1792 kB]
Get:9 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1522 kB]
Get:10 http://archive.ubuntu.com/ubuntu jammy-updates/restricted amd64 Packages [3742 kB]
Get:11 http://archive.ubuntu.com/ubuntu jammy-updates/multiverse amd64 Packages [53.3 kB]
Get:12 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [2906 kB]
Get:13 http://archive.ubuntu.com/ubuntu jammy-backports/main amd64 Packages [81.4 kB]
Get:14 http://archive.ubuntu.com/ubuntu jammy-backports/universe amd64 Packages [35.2 kB]
Get:15 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [1229 kB]
Get:16 http://security.ubuntu.com/ubuntu jammy-security/multiverse amd64 Packages [45.2 kB]
Get:17 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [3606 kB]
Get:18 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [2604 kB]
Fetched 36.2 MB in 4s (8318 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
build-essential is already the newest version (12.9ubuntu3).
Suggested packages:
gettext-base git-daemon-run | git-daemon-sysvinit git-doc git-email git-gui
gitk gitweb git-cvs git-mediawiki git-svn
The following packages will be upgraded:
ca-certificates git
2 upgraded, 0 newly installed, 0 to remove and 67 not upgraded.
Need to get 3328 kB of archives.
After this operation, 29.7 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 ca-certificates all 20240203~22.04.1 [162 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 git amd64 1:2.34.1-1ubuntu1.12 [3165 kB]
[91mdebconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
[0m
[91mdebconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
[0m
[91mdpkg-preconfigure: unable to re-open stdin:
[0m
Fetched 3328 kB in 2s (2122 kB/s)
(Reading database ...
(Reading database ... 5%
(Reading database ... 10%(Reading database ... 15%(Reading database ... 20%(Reading database ... 25%(Reading
database ... 30%(Reading database ... 35%
(Reading database ... 40%(Reading database ... 45%(Reading database ... 50%(Reading database ... 55%
(Reading database ... 60%
(Reading database ... 65%
(Reading database ... 70%
(Reading database ... 75%
(Reading database ... 80%
(Reading database ... 85%
(Reading database ... 90%
(Reading database ... 95%
(Reading database ... 100%(Reading database ... 23388 files and directories currently installed.)
Preparing to unpack .../ca-certificates_20240203~22.04.1_all.deb ...
Unpacking ca-certificates (20240203~22.04.1) over (20230311ubuntu0.22.04.1) ...
Preparing to unpack .../git_1%3a2.34.1-1ubuntu1.12_amd64.deb ...
Unpacking git (1:2.34.1-1ubuntu1.12) over (1:2.34.1-1ubuntu1.10) ...
Setting up ca-certificates (20240203~22.04.1) ...
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
Updating certificates in /etc/ssl/certs...
rehash: warning: skipping ca-certificates.crt,it does not contain exactly one certificate or CRL
14 added, 5 removed; done.
Setting up git (1:2.34.1-1ubuntu1.12) ...
Processing triggers for ca-certificates (20240203~22.04.1) ...
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
---> Removed intermediate container e894b4f30bcd
---> 20ba0b9ef3d6
Step 4/43 : ENV CUDA_HOME=/usr/local/cuda
---> Running in 9c323b9d3195
---> Removed intermediate container 9c323b9d3195
---> 8d29b0410634
Step 5/43 : ENV PATH=/usr/local/cuda/bin:$PATH
---> Running in 087c885a889b
---> Removed intermediate container 087c885a889b
---> 2bf8b96da4fd
Step 6/43 : ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
---> Running in 704f029cfe4d
---> Removed intermediate container 704f029cfe4d
---> 6f46f2159747
Step 7/43 : ENV CUDA_LAUNCH_BLOCKING=1
---> Running in 01b914789cc6
---> Removed intermediate container 01b914789cc6
---> 1a4828d1a918
Step 8/43 : ENV PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32
---> Running in f360baadffb8
---> Removed intermediate container f360baadffb8
---> eb509cb8dd9f
Step 9/43 : ENV CUDA_MODULE_LOADING=LAZY
---> Running in 852660da4441
---> Removed intermediate container 852660da4441
---> febd2c0524da
Step 10/43 : ENV TORCH_USE_CUDA_DSA=0
---> Running in 5173befd6dca
---> Removed intermediate container 5173befd6dca
---> bdf924875416
Step 11/43 : ENV CUDA_DEVICE_MAX_CONNECTIONS=1
---> Running in 2f5d93cbebe4
---> Removed intermediate container 2f5d93cbebe4
---> 9a5a0923a8b7
Step 12/43 : ENV NCCL_ASYNC_ERROR_HANDLING=1
---> Running in 34113ab2ebaa
---> Removed intermediate container 34113ab2ebaa
---> 0de44ca15148
Step 13/43 : ENV OMP_NUM_THREADS=1
---> Running in f6bf33a8cd7c
---> Removed intermediate container f6bf33a8cd7c
---> b3a651e90a0e
Step 14/43 : ENV NVTE_FRAMEWORK=pytorch
---> Running in ca0da99992c2
---> Removed intermediate container ca0da99992c2
---> 594d7694931b
Step 15/43 : ENV MAX_JOBS=4
---> Running in ca3543e75ea7
---> Removed intermediate container ca3543e75ea7
---> b5fc490025bc
Step 16/43 : ENV DEBIAN_FRONTEND=noninteractive
---> Running in 1f720f355360
---> Removed intermediate container 1f720f355360
---> bfc6167504d7
Step 17/43 : ENV TORCH_CUDNN_V8_API_ENABLED=1
---> Running in 03a5a9c9f2a2
---> Removed intermediate container 03a5a9c9f2a2
---> ebb5a27f12ef
Step 18/43 : ENV TORCH_ALLOW_TF32=1
---> Running in 9c8633285325
---> Removed intermediate container 9c8633285325
---> c042d2d289a8
Step 19/43 : ENV TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9;9.0"
---> Running in 2b6c6a90ce57
---> Removed intermediate container 2b6c6a90ce57
---> 13385a36b093
Step 20/43 : ENV PYTORCH_JIT=0
---> Running in 41809fdef488
---> Removed intermediate container 41809fdef488
---> 2fd362f6799f
Step 21/43 : ENV TORCH_INDUCTOR_DISABLE_CUDA_GRAPH=1
---> Running in 11c3d7745f43
---> Removed intermediate container 11c3d7745f43
---> e2f7fa3718d8
Step 22/43 : ENV TORCH_INDUCTOR_USE_PYTHON_BINDING=0
---> Running in ecad131853a9
---> Removed intermediate container ecad131853a9
---> 113404e7384e
Step 23/43 : ENV PYTHONFAULTHANDLER=1
---> Running in 86907ed59d03
---> Removed intermediate container 86907ed59d03
---> b33f9f7060a8
Step 24/43 : ENV PYTHONUNBUFFERED=1
---> Running in 19dfa8d1d39f
---> Removed intermediate container 19dfa8d1d39f
---> cc0c2f62fa95
Step 25/43 : ENV NCCL_IB_DISABLE=1
---> Running in 5264bcf89e19
---> Removed intermediate container 5264bcf89e19
---> ae8cbf63dae2
Step 26/43 : ENV PYTORCH_NO_CUDA_MEMORY_CACHING=1
---> Running in bf673d401f64
---> Removed intermediate container bf673d401f64
---> 37aa1cf00693
Step 27/43 : ENV TORCH_SHOW_CPP_STACKTRACES=0
---> Running in c1f474a874a8
---> Removed intermediate container c1f474a874a8
---> ebf0470bd4fa
Step 28/43 : ENV PYTHONWARNINGS=ignore
---> Running in 455fba25d402
---> Removed intermediate container 455fba25d402
---> 593795b5e027
Step 29/43 : RUN python3 -c "import torch; print(f'PyTorch version: {torch.__version__}')"
---> Running in 0fdf384ebf9f
PyTorch version: 2.3.0a0+6ddf5cf85e.nv24.04
---> Removed intermediate container 0fdf384ebf9f
---> 178fb4e96dc9
Step 30/43 : RUN mkdir -p /app/workspace
---> Running in f472e2264cdc
---> Removed intermediate container f472e2264cdc
---> d00e44b9ac9d
Step 31/43 : COPY requirements.txt /app/workspace/
---> 61e71e98c4a2
Step 32/43 : COPY *.py /app/workspace/
---> e41e60817ab2
Step 33/43 : COPY dyana-requirements*.txt /app/workspace/
---> 440647b3f6af
Step 34/43 : WORKDIR /app/workspace
---> Running in d52cbf7fb3a4
---> Removed intermediate container d52cbf7fb3a4
---> c101bde2bc20
Step 35/43 : RUN pip install --no-cache-dir -r requirements.txt
---> Running in 4a0981977b86
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com,
https://download.pytorch.org/whl/cu121
Looking in links: https://developer.download.nvidia.com/compute/redist
Requirement already satisfied: torch>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from -r
requirements.txt (line 5)) (2.3.0a0+6ddf5cf85e.nv24.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from -r
requirements.txt (line 6)) (23.2)
Requirement already satisfied: typing_extensions>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from -r
requirements.txt (line 7)) (4.10.0)
Collecting flash-attn==2.6.1 (from -r requirements.txt (line 10))
Downloading flash_attn-2.6.1.tar.gz (2.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.6/2.6 MB 8.7 MB/s eta 0:00:00
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Collecting sentencepiece==0.2.0 (from -r requirements.txt (line 11))
Downloading sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting hydra-core==1.3.2 (from -r requirements.txt (line 12))
Downloading hydra_core-1.3.2-py3-none-any.whl.metadata (5.5 kB)
Collecting hydra_colorlog==1.2.0 (from -r requirements.txt (line 13))
Downloading hydra_colorlog-1.2.0-py3-none-any.whl.metadata (949 bytes)
Collecting nltk (from -r requirements.txt (line 14))
Downloading nltk-3.9.1-py3-none-any.whl.metadata (2.9 kB)
Collecting datasets (from -r requirements.txt (line 15))
Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Requirement already satisfied: psutil>=5.6.7 in /usr/local/lib/python3.10/dist-packages (from -r
requirements.txt (line 18)) (5.9.4)
Requirement already satisfied: einops in /usr/local/lib/python3.10/dist-packages (from flash-attn==2.6.1->-r
requirements.txt (line 10)) (0.7.0)
Collecting omegaconf<2.4,>=2.2 (from hydra-core==1.3.2->-r requirements.txt (line 12))
Downloading omegaconf-2.3.0-py3-none-any.whl.metadata (3.9 kB)
Collecting antlr4-python3-runtime==4.9.* (from hydra-core==1.3.2->-r requirements.txt (line 12))
Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 117.0/117.0 kB 31.6 MB/s eta 0:00:00
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Collecting colorlog (from hydra_colorlog==1.2.0->-r requirements.txt (line 13))
Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (3.13.3)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (2.6.3)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (3.1.3)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (2024.2.0)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk->-r
requirements.txt (line 14)) (8.1.7)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk->-r
requirements.txt (line 14)) (1.3.2)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk->-r
requirements.txt (line 14)) (2023.12.25)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from nltk->-r
requirements.txt (line 14)) (4.66.2)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from datasets->-r
requirements.txt (line 15)) (1.24.4)
Collecting pyarrow>=15.0.0 (from datasets->-r requirements.txt (line 15))
Downloading pyarrow-19.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets->-r requirements.txt (line 15))
Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from datasets->-r
requirements.txt (line 15)) (1.5.3)
Collecting requests>=2.32.2 (from datasets->-r requirements.txt (line 15))
Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting tqdm (from nltk->-r requirements.txt (line 14))
Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.7/57.7 kB 67.3 MB/s eta 0:00:00
Collecting xxhash (from datasets->-r requirements.txt (line 15))
Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets->-r requirements.txt (line 15))
Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets->-r
requirements.txt (line 15)) (3.9.3)
Collecting huggingface-hub>=0.23.0 (from datasets->-r requirements.txt (line 15))
Downloading huggingface_hub-0.28.1-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from datasets->-r
requirements.txt (line 15)) (6.0.1)
Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from
aiohttp->datasets->-r requirements.txt (line 15)) (1.3.1)
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from
aiohttp->datasets->-r requirements.txt (line 15)) (23.2.0)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from
aiohttp->datasets->-r requirements.txt (line 15)) (1.4.1)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from
aiohttp->datasets->-r requirements.txt (line 15)) (6.0.5)
Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from
aiohttp->datasets->-r requirements.txt (line 15)) (1.9.4)
Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from
aiohttp->datasets->-r requirements.txt (line 15)) (4.0.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from
requests>=2.32.2->datasets->-r requirements.txt (line 15)) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from
requests>=2.32.2->datasets->-r requirements.txt (line 15)) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from
requests>=2.32.2->datasets->-r requirements.txt (line 15)) (1.26.18)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from
requests>=2.32.2->datasets->-r requirements.txt (line 15)) (2024.2.2)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from
jinja2->torch>=2.0.0->-r requirements.txt (line 5)) (2.1.5)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from
pandas->datasets->-r requirements.txt (line 15)) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from
pandas->datasets->-r requirements.txt (line 15)) (2024.1)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from
sympy->torch>=2.0.0->-r requirements.txt (line 5)) (1.3.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from
python-dateutil>=2.8.1->pandas->datasets->-r requirements.txt (line 15)) (1.16.0)
Downloading sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 11.1 MB/s eta 0:00:00
Downloading hydra_core-1.3.2-py3-none-any.whl (154 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 154.5/154.5 kB 18.5 MB/s eta 0:00:00
Downloading hydra_colorlog-1.2.0-py3-none-any.whl (3.6 kB)
Downloading nltk-3.9.1-py3-none-any.whl (1.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 11.8 MB/s eta 0:00:00
Downloading datasets-3.2.0-py3-none-any.whl (480 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 480.6/480.6 kB 12.8 MB/s eta 0:00:00
Downloading dill-0.3.8-py3-none-any.whl (116 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 21.9 MB/s eta 0:00:00
Downloading huggingface_hub-0.28.1-py3-none-any.whl (464 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 464.1/464.1 kB 12.7 MB/s eta 0:00:00
Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.8/134.8 kB 32.5 MB/s eta 0:00:00
Downloading omegaconf-2.3.0-py3-none-any.whl (79 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 79.5/79.5 kB 65.7 MB/s eta 0:00:00
Downloading pyarrow-19.0.0-cp310-cp310-manylinux_2_28_x86_64.whl (42.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.1/42.1 MB 11.3 MB/s eta 0:00:00
Downloading requests-2.32.3-py3-none-any.whl (64 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.9/64.9 kB 59.4 MB/s eta 0:00:00
Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 kB 54.0 MB/s eta 0:00:00
Downloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.1/194.1 kB 23.9 MB/s eta 0:00:00
Building wheels for collected packages: flash-attn, antlr4-python3-runtime
Building wheel for flash-attn (setup.py): started
Building wheel for flash-attn (setup.py): finished with status 'done'
Created wheel for flash-attn: filename=flash_attn-2.6.1-cp310-cp310-linux_x86_64.whl size=198444860
sha256=8da92ac0324e367e37327e6cdd621b9e0a02e72b6328c7ad8096ffa4a9c9b699
Stored in directory:
/tmp/pip-ephem-wheel-cache-hyqa6f8e/wheels/91/6a/38/f0faa036b4ac73a73247386f1ab1bb4cb4f6e72e6861a779f1
Building wheel for antlr4-python3-runtime (setup.py): started
Building wheel for antlr4-python3-runtime (setup.py): finished with status 'done'
Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144554
sha256=003a14a84fed98340bda66aef955f49cc9035b67d7c4dd734acb93f6bd24b931
Stored in directory:
/tmp/pip-ephem-wheel-cache-hyqa6f8e/wheels/12/93/dd/1f6a127edc45659556564c5730f6d4e300888f4bca2d4c5a88
Successfully built flash-attn antlr4-python3-runtime
Installing collected packages: sentencepiece, antlr4-python3-runtime, xxhash, tqdm, requests, pyarrow,
omegaconf, dill, colorlog, nltk, multiprocess, hydra-core, huggingface-hub, hydra_colorlog, flash-attn,
datasets
Attempting uninstall: tqdm
Found existing installation: tqdm 4.66.2
Uninstalling tqdm-4.66.2:
Successfully uninstalled tqdm-4.66.2
Attempting uninstall: requests
Found existing installation: requests 2.31.0
Uninstalling requests-2.31.0:
Successfully uninstalled requests-2.31.0
Attempting uninstall: pyarrow
Found existing installation: pyarrow 14.0.1
Uninstalling pyarrow-14.0.1:
Successfully uninstalled pyarrow-14.0.1
Attempting uninstall: flash-attn
Found existing installation: flash-attn 2.4.2
Uninstalling flash-attn-2.4.2:
Successfully uninstalled flash-attn-2.4.2
[91mERROR: pip's dependency resolver does not currently take into account all the packages that are
installed. This behaviour is the source of the following dependency conflicts.
torchtext 0.17.0a0 requires torch==2.3.0a0+6ddf5cf, but you have torch 2.3.0a0+6ddf5cf85e.nv24.4 which is
incompatible.
transformer-engine 1.5.0+6a9edc3 requires flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6, but you have flash-attn
2.6.1 which is incompatible.
[0m
Successfully installed antlr4-python3-runtime-4.9.3 colorlog-6.9.0 datasets-3.2.0 dill-0.3.8 flash-attn-2.6.1
huggingface-hub-0.28.1 hydra-core-1.3.2 hydra_colorlog-1.2.0 multiprocess-0.70.16 nltk-3.9.1 omegaconf-2.3.0
pyarrow-19.0.0 requests-2.32.3 sentencepiece-0.2.0 tqdm-4.67.1 xxhash-3.5.0
[91mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with
the system package manager. It is recommended to use a virtual environment instead:
https://pip.pypa.io/warnings/venv
[0m
[91m
A new release of pip is available: 24.0 -> 25.0
To update, run: python -m pip install --upgrade pip
[0m
---> Removed intermediate container 4a0981977b86
---> 8ec31874d825
Step 36/43 : RUN git clone --depth 1 --branch dmc https://github.com/NVIDIA/Megatron-LM.git /app/Megatron-LM
&& cd /app/Megatron-LM && pip install -e .
---> Running in b9756ac9ad47
[91mCloning into '/app/Megatron-LM'...
[0m
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///app/Megatron-LM
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Checking if build backend supports build_editable: started
Checking if build backend supports build_editable: finished with status 'done'
Getting requirements to build editable: started
Getting requirements to build editable: finished with status 'done'
Preparing editable metadata (pyproject.toml): started
Preparing editable metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from
megatron-core==0.10.0rc0) (2.3.0a0+6ddf5cf85e.nv24.4)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from
megatron-core==0.10.0rc0) (23.2)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (3.13.3)
Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (4.10.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (2.6.3)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (3.1.3)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (2024.2.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from
jinja2->torch->megatron-core==0.10.0rc0) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from
sympy->torch->megatron-core==0.10.0rc0) (1.3.0)
Building wheels for collected packages: megatron-core
Building editable for megatron-core (pyproject.toml): started
Building editable for megatron-core (pyproject.toml): finished with status 'done'
Created wheel for megatron-core: filename=megatron_core-0.10.0rc0-0.editable-cp310-cp310-linux_x86_64.whl
size=16959 sha256=8118543f69b1ca10dfa02d3d97fd4320d01ee68fb8dd5f96f162cf89f3ea8e60
Stored in directory:
/tmp/pip-ephem-wheel-cache-uxpcwgl2/wheels/ac/13/e6/1ba2b5b3bea71b4ae468d84030e90cc700c9d586023e483a1e
Successfully built megatron-core
Installing collected packages: megatron-core
Successfully installed megatron-core-0.10.0rc0
[91mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with
the system package manager. It is recommended to use a virtual environment instead:
https://pip.pypa.io/warnings/venv
[0m
[91m
A new release of pip is available: 24.0 -> 25.0
To update, run: python -m pip install --upgrade pip
[0m
---> Removed intermediate container b9756ac9ad47
---> ceff0cbde09f
Step 37/43 : ENV PYTHONPATH=/app/Megatron-LM:$PYTHONPATH
---> Running in f548a5a2acb6
---> Removed intermediate container f548a5a2acb6
---> ec7bebd8534f
Step 38/43 : RUN mkdir -p /dev/shm && mkdir -p /tmp/pytorch_extensions && chmod -R 777 /dev/shm
/tmp/pytorch_extensions
---> Running in 13752a0aed96
---> Removed intermediate container 13752a0aed96
---> 0066fe741c09
Step 39/43 : RUN printf '#!/bin/bash\nexport PYTHONPATH=/app/workspace:/app/Megatron-LM:$PYTHONPATH\nexec
python3 -W ignore main.py "$@"\n' > /app/workspace/entrypoint.sh && chmod +x /app/workspace/entrypoint.sh
---> Running in 2b8ba5cd04b8
---> Removed intermediate container 2b8ba5cd04b8
---> b53fa25d5ec4
Step 40/43 : RUN ls -la /app/workspace && ls -la /app/workspace/entrypoint.sh && test -x
/app/workspace/entrypoint.sh
---> Running in 2ed69204f222
total 48
drwxr-xr-x 1 root root 4096 Feb 4 15:32 .
drwxr-xr-x 1 root root 4096 Feb 4 15:31 ..
-rw-rw-r-- 1 root root 42 Feb 4 15:24 dyana-requirements-gpu.txt
-rw-rw-r-- 1 root root 29 Feb 4 15:24 dyana-requirements.txt
-rw-rw-r-- 1 root root 6886 Feb 4 15:24 dyana.py
-rwxr-xr-x 1 root root 110 Feb 4 15:32 entrypoint.sh
-rw-rw-r-- 1 root root 9809 Feb 4 15:24 main.py
-rw-rw-r-- 1 root root 365 Feb 4 13:52 requirements.txt
-rw-rw-r-- 1 root root 594 Feb 4 15:24 verify.py
-rwxr-xr-x 1 root root 110 Feb 4 15:32 /app/workspace/entrypoint.sh
---> Removed intermediate container 2ed69204f222
---> 5d54d00c357b
Step 41/43 : RUN chown -R root:root /app && chmod -R 755 /app && chmod +x
/app/workspace/entrypoint.sh
---> Running in 58b76ae8eb55
---> Removed intermediate container 58b76ae8eb55
---> 80a60f5e2725
Step 42/43 : SHELL ["/bin/bash", "-c"]
---> Running in 6f1a0c990281
---> Removed intermediate container 6f1a0c990281
---> 1000f37c91b1
Step 43/43 : ENTRYPOINT ["/bin/bash", "-c", "exec /app/workspace/entrypoint.sh \"$@\""]
---> Running in 58c097f1d1ec
---> Removed intermediate container 58c097f1d1ec
---> be753d33648c
Successfully built be753d33648c
Successfully tagged dyana-megatron-loader:latest
👁️🗨️ tracer: initializing ...
👁️🗨️ tracer: started ...
🍿 loader: warning: allowing bridged network access to the container
🍿 loader: executing with arguments ['--model', '/model_optim_rng.pt', '--tokenizer',
'/llama-2-7b-tokenizer.model', '--size', '7B', '--input', 'This is an example prompt.'] ...
👁️🗨️ tracer: stopping ...
🗃 saving 3797 events to trace.json
Platform : Linux-6.8.0-52-generic-x86_64-with-glibc2.35
Loader : megatron
Arguments : --model /model_optim_rng.pt --tokenizer /llama-2-7b-tokenizer.model --size 7B --input This
is an example prompt.
Volumes : /model_optim_rng.pt (/home/ads/model_optim_rng.pt), /llama-2-7b-tokenizer.model
(/home/ads/llama-2-7b-tokenizer.model)
Started at : 2025-02-04T08:32:48.362682
Ended at : 2025-02-04T08:32:52.217769
Total Events : 3797
Stdout : [W init.cpp:767] Warning: nvfuser is no longer supported in torch script, use
_jit_set_nvfuser_enabled is deprecated and a no-op (function operator())
RAM Usage:
* start : 363.5MiB
* cuda_initialized : 476.8MiB 🔺 113.3MiB
* end : 476.8MiB
* end : 477.1MiB 🔺 384.0KiB
Disk Usage:
* start : 1.4TiB
* cuda_initialized : 1.4TiB 🔺 152.0KiB
* end : 1.4TiB 🔺 140.0KiB
* end : 1.4TiB 🔺 164.0KiB
Process Executions:
* 1 python3 -> execve /usr/bin/python3 ['python3', '-W', 'ignore', 'main.py', '/model_optim_rng.pt',
'--tokenizer', '/llama-2-7b-tokenizer.model', '--size', '7B', '--input', 'This is an example prompt.']
* 10 hostname -> execve /usr/bin/hostname ['hostname']
* 19 touch -> execve /usr/bin/touch ['touch', '/usr/local/cuda/compat/.570.86.10.b354001c1141.checked']
* 20 rm -> execve /usr/bin/rm ['rm', '-f', '/usr/local/cuda/compat/lib']
* 30 hostname -> execve /usr/bin/hostname ['hostname']
* 9 sed -> execve /usr/bin/sed ['sed', 's/^$/unknown/']
* 8 sed -> execve /usr/bin/sed ['sed', '-n', 's/^NVRM.*Kernel Module\\( for *\\| \\) *\\([^()
]*\\).*$/\\2/p', '/proc/driver/nvidia/version']
* 12 nvidia-smi -> execve /usr/bin/nvidia-smi ['nvidia-smi', '-q', '-d', 'COMPUTE']
* 13 grep -> execve /usr/bin/grep ['grep', '^CUDA Version']
* 14 sed -> execve /usr/bin/sed ['sed', 's/^.*: //']
* 18 cut -> execve /usr/bin/cut ['cut', '-d', '.', '-f', '1-2']
* 22 timeout -> execve /usr/bin/timeout ['timeout', '-s', 'KILL', '35', '/usr/local/bin/cudaCheck']
* 23 cudaCheck -> execve /usr/local/bin/cudaCheck ['/usr/local/bin/cudaCheck']
* 26 cat -> execve /usr/bin/cat ['cat', '/sys/module/mlx5_core/version']
* 29 sed -> execve /usr/bin/sed ['sed', 's/^$/unknown/']
* 28 sed -> execve /usr/bin/sed ['sed', '-n', 's/^NVRM.*Kernel Module\\( for *\\| \\) *\\([^()
]*\\).*$/\\2/p', '/proc/driver/nvidia/version']
* 34 sed -> execve /usr/bin/sed ['sed', 's/^.*: //']
* 32 nvidia-smi -> execve /usr/bin/nvidia-smi ['nvidia-smi', '-q', '-d', 'COMPUTE']
* 33 grep -> execve /usr/bin/grep ['grep', '^CUDA Version']
* 38 cut -> execve /usr/bin/cut ['cut', '-d', '.', '-f', '1-2']
* 40 cat -> execve /usr/bin/cat ['cat', '/sys/module/mlx5_core/version']
Network Usage:
eth0
start : rx=1.7KiB tx=0.0B
cuda_initialized : rx=1.7KiB tx=0.0B
end : rx=1.7KiB tx=0.0B
end : rx=1.9KiB 🔺 207.0B tx=0.0B
Network Activity:
* [23] cudaCheck -> connect /tmp/nvidia-mps/control
* [1] fuse -> connect /var/run/nscd/socket
* [1] fuse -> connect /tmp/ucx-vfs-root.sock
* [1] python3 -> connect /tmp/nvidia-mps/control
File Accesses:
* /app/Megatron-LM
* /app/workspace
* /app/workspace/__pycache__/dyana.cpython-310.pyc.137882571877808
* /app/workspace/dyana.py
* /app/workspace/entrypoint.sh
* /app/workspace/main.py
* /proc
* /usr/bin/cat
* /usr/bin/cut
* /usr/bin/grep
* /usr/bin/hostname
* /usr/bin/nvidia-smi
* /usr/bin/python3
* /usr/bin/rm
* /usr/bin/sed
* /usr/bin/timeout
* /usr/bin/touch
* /usr/local/bin/cudaCheck
* /usr/local/cuda/compat/.570.86.10.b354001c1141.checked
* /usr/local/cuda/compat/lib.real/libcuda.so.1
* /usr/local/cuda/lib64/libcublas.so.12
* /usr/local/cuda/lib64/libcublasLt.so.12
* /usr/local/cuda/lib64/libcudart.so.12
* /usr/local/cuda/lib64/libcufft.so.11
* /usr/local/cuda/lib64/libcupti.so.12
* /usr/local/cuda/lib64/libcurand.so.10
* /usr/local/cuda/lib64/libcusparse.so.12
* /usr/local/cuda/lib64/libnvJitLink.so.12
* /usr/local/cuda/lib64/libnvToolsExt.so.1
* 2075 accesses to /usr/local/lib/*
* 1114 accesses to /usr/lib/*
* 18 accesses to /lib/*
* 102 accesses to /dev/*
* 146 accesses to /proc/*
* 45 accesses to /sys/*
* 90 accesses to /etc/*
* 1 accesses to /tmp/*
Security Events:
* Dynamic code loading detected (defense-evasion, moderate severity) trace attached: |
@GangGreenTemperTatum there's a problem with the loader, the gpu data is in the trace, but no GPU memory has been used by the loader (you can see the used memory is the same at all stages) ... it seems like everything is running on CPU and RAM ... ? |
awesome spot, thanks @evilsocket lemme take a look |
thanks for the great info @evilsocket ! for tracking purposes, commenting this here. i tested removing those variables from the Click to expandTotal reclaimed space: 20.91GB
🐳 loader: initializing loader megatron
Step 1/36 : FROM nvcr.io/nvidia/pytorch:24.04-py3
---> 3f0b23af1f4f
Step 2/36 : WORKDIR /app
---> Running in 0614c0254e13
---> Removed intermediate container 0614c0254e13
---> 25b151a72bc9
Step 3/36 : RUN apt-get update && apt-get install -y --no-install-recommends git ca-certificates
build-essential && rm -rf /var/lib/apt/lists/*
---> Running in e0ebc228422f
Get:1 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy InRelease [270 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:4 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Get:5 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [2606 kB]
Get:6 http://archive.ubuntu.com/ubuntu jammy/restricted amd64 Packages [164 kB]
Get:7 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [3606 kB]
Get:8 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages [1792 kB]
Get:9 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [1229 kB]
Get:10 http://security.ubuntu.com/ubuntu jammy-security/multiverse amd64 Packages [45.2 kB]
Get:11 http://archive.ubuntu.com/ubuntu jammy/multiverse amd64 Packages [266 kB]
Get:12 http://archive.ubuntu.com/ubuntu jammy/universe amd64 Packages [17.5 MB]
Get:13 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1522 kB]
Get:14 http://archive.ubuntu.com/ubuntu jammy-updates/multiverse amd64 Packages [53.3 kB]
Get:15 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [2906 kB]
Get:16 http://archive.ubuntu.com/ubuntu jammy-updates/restricted amd64 Packages [3742 kB]
Get:17 http://archive.ubuntu.com/ubuntu jammy-backports/universe amd64 Packages [35.2 kB]
Get:18 http://archive.ubuntu.com/ubuntu jammy-backports/main amd64 Packages [81.4 kB]
Fetched 36.2 MB in 5s (6680 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
build-essential is already the newest version (12.9ubuntu3).
Suggested packages:
gettext-base git-daemon-run | git-daemon-sysvinit git-doc git-email git-gui
gitk gitweb git-cvs git-mediawiki git-svn
The following packages will be upgraded:
ca-certificates git
2 upgraded, 0 newly installed, 0 to remove and 67 not upgraded.
Need to get 3328 kB of archives.
After this operation, 29.7 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 ca-certificates all 20240203~22.04.1 [162 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 git amd64 1:2.34.1-1ubuntu1.12 [3165 kB]
[91mdebconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
[0m
[91mdebconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
[0m
[91mdebconf: falling back to frontend: Teletype
[0m
[91mdpkg-preconfigure: unable to re-open stdin:
[0m
Fetched 3328 kB in 2s (2100 kB/s)
(Reading database ...
(Reading database ... 5%
(Reading database ... 10%(Reading database ... 15%(Reading database ... 20%(Reading database ... 25%(Reading
database ... 30%
(Reading database ... 35%(Reading database ... 40%(Reading database ... 45%
(Reading database ... 50%(Reading database ... 55%(Reading database ... 60%
(Reading database ... 65%
(Reading database ... 70%
(Reading database ... 75%
(Reading database ... 80%
(Reading database ... 85%
(Reading database ... 90%
(Reading database ... 95%
(Reading database ... 100%(Reading database ... 23388 files and directories currently installed.)
Preparing to unpack .../ca-certificates_20240203~22.04.1_all.deb ...
Unpacking ca-certificates (20240203~22.04.1) over (20230311ubuntu0.22.04.1) ...
Preparing to unpack .../git_1%3a2.34.1-1ubuntu1.12_amd64.deb ...
Unpacking git (1:2.34.1-1ubuntu1.12) over (1:2.34.1-1ubuntu1.10) ...
Setting up ca-certificates (20240203~22.04.1) ...
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
Updating certificates in /etc/ssl/certs...
rehash: warning: skipping ca-certificates.crt,it does not contain exactly one certificate or CRL
14 added, 5 removed; done.
Setting up git (1:2.34.1-1ubuntu1.12) ...
Processing triggers for ca-certificates (20240203~22.04.1) ...
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
---> Removed intermediate container e0ebc228422f
---> ff5847b29b13
Step 4/36 : ENV CUDA_HOME=/usr/local/cuda
---> Running in ffa9f1fba0b7
---> Removed intermediate container ffa9f1fba0b7
---> a0e7f6c79a8a
Step 5/36 : ENV PATH=/usr/local/cuda/bin:$PATH
---> Running in 99a550898d70
---> Removed intermediate container 99a550898d70
---> 8a6dc47773ca
Step 6/36 : ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
---> Running in 7f7a8ff7fe60
---> Removed intermediate container 7f7a8ff7fe60
---> a1e782464508
Step 7/36 : ENV CUDA_LAUNCH_BLOCKING=1
---> Running in d30ee709ccb8
---> Removed intermediate container d30ee709ccb8
---> 7b3c2e660c86
Step 8/36 : ENV PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32
---> Running in fd33cb2c994e
---> Removed intermediate container fd33cb2c994e
---> f643d622b3a3
Step 9/36 : ENV CUDA_MODULE_LOADING=LAZY
---> Running in f15a43d7760f
---> Removed intermediate container f15a43d7760f
---> 179602d828d5
Step 10/36 : ENV TORCH_USE_CUDA_DSA=0
---> Running in 76f7e57b12bb
---> Removed intermediate container 76f7e57b12bb
---> da2002f15766
Step 11/36 : ENV CUDA_DEVICE_MAX_CONNECTIONS=1
---> Running in cdeee2326c47
---> Removed intermediate container cdeee2326c47
---> 2bc8913df042
Step 12/36 : ENV NCCL_ASYNC_ERROR_HANDLING=1
---> Running in 9ac054a82e92
---> Removed intermediate container 9ac054a82e92
---> 35f666ed11d8
Step 13/36 : ENV OMP_NUM_THREADS=1
---> Running in aa0e6d47ae1b
---> Removed intermediate container aa0e6d47ae1b
---> 307dc43e533c
Step 14/36 : ENV NVTE_FRAMEWORK=pytorch
---> Running in 08f225a5cbbc
---> Removed intermediate container 08f225a5cbbc
---> 54efb6a6e09f
Step 15/36 : ENV MAX_JOBS=4
---> Running in 2b2dd72747e4
---> Removed intermediate container 2b2dd72747e4
---> 73e2766080c5
Step 16/36 : ENV DEBIAN_FRONTEND=noninteractive
---> Running in efd6f079e51b
---> Removed intermediate container efd6f079e51b
---> b78d3b506e53
Step 17/36 : ENV TORCH_CUDNN_V8_API_ENABLED=1
---> Running in 2eee13c9b995
---> Removed intermediate container 2eee13c9b995
---> 7e5456c99cc4
Step 18/36 : ENV TORCH_ALLOW_TF32=1
---> Running in 26a2f92cb2e5
---> Removed intermediate container 26a2f92cb2e5
---> 86ea7f148fa2
Step 19/36 : ENV TORCH_SHOW_CPP_STACKTRACES=0
---> Running in d26e4b1c982d
---> Removed intermediate container d26e4b1c982d
---> cc9357ade82a
Step 20/36 : ENV PYTHONWARNINGS=ignore
---> Running in 8879c2422b30
---> Removed intermediate container 8879c2422b30
---> 45ca7efb529b
Step 21/36 : ENV NVIDIA_VISIBLE_DEVICES="all"
---> Running in e2ed3be67a1b
---> Removed intermediate container e2ed3be67a1b
---> b538292c6dd5
Step 22/36 : RUN python3 -c "import torch; print(f'PyTorch version: {torch.__version__}')"
---> Running in fd1cc6a46d3b
PyTorch version: 2.3.0a0+6ddf5cf85e.nv24.04
---> Removed intermediate container fd1cc6a46d3b
---> 89bad2db14e0
Step 23/36 : RUN mkdir -p /app/workspace
---> Running in b45a54ccaf22
---> Removed intermediate container b45a54ccaf22
---> 68ba08df583f
Step 24/36 : COPY requirements.txt /app/workspace/
---> 3aee75995cc2
Step 25/36 : COPY *.py /app/workspace/
---> 8cf3ad6bdca7
Step 26/36 : COPY dyana-requirements*.txt /app/workspace/
---> caebc41391a3
Step 27/36 : WORKDIR /app/workspace
---> Running in 2b67a911a625
---> Removed intermediate container 2b67a911a625
---> 6f84ae7cad7a
Step 28/36 : RUN pip install --no-cache-dir -r requirements.txt
---> Running in 6fab7fef9d7a
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com,
https://download.pytorch.org/whl/cu121
Looking in links: https://developer.download.nvidia.com/compute/redist
Requirement already satisfied: torch>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from -r
requirements.txt (line 5)) (2.3.0a0+6ddf5cf85e.nv24.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from -r
requirements.txt (line 6)) (23.2)
Requirement already satisfied: typing_extensions>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from -r
requirements.txt (line 7)) (4.10.0)
Collecting flash-attn==2.6.1 (from -r requirements.txt (line 10))
Downloading flash_attn-2.6.1.tar.gz (2.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.6/2.6 MB 13.3 MB/s eta 0:00:00
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Collecting sentencepiece==0.2.0 (from -r requirements.txt (line 11))
Downloading sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting hydra-core==1.3.2 (from -r requirements.txt (line 12))
Downloading hydra_core-1.3.2-py3-none-any.whl.metadata (5.5 kB)
Collecting hydra_colorlog==1.2.0 (from -r requirements.txt (line 13))
Downloading hydra_colorlog-1.2.0-py3-none-any.whl.metadata (949 bytes)
Collecting nltk (from -r requirements.txt (line 14))
Downloading nltk-3.9.1-py3-none-any.whl.metadata (2.9 kB)
Collecting datasets (from -r requirements.txt (line 15))
Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Requirement already satisfied: psutil>=5.6.7 in /usr/local/lib/python3.10/dist-packages (from -r
requirements.txt (line 18)) (5.9.4)
Requirement already satisfied: einops in /usr/local/lib/python3.10/dist-packages (from flash-attn==2.6.1->-r
requirements.txt (line 10)) (0.7.0)
Collecting omegaconf<2.4,>=2.2 (from hydra-core==1.3.2->-r requirements.txt (line 12))
Downloading omegaconf-2.3.0-py3-none-any.whl.metadata (3.9 kB)
Collecting antlr4-python3-runtime==4.9.* (from hydra-core==1.3.2->-r requirements.txt (line 12))
Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 117.0/117.0 kB 38.4 MB/s eta 0:00:00
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Collecting colorlog (from hydra_colorlog==1.2.0->-r requirements.txt (line 13))
Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (3.13.3)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (2.6.3)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (3.1.3)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (2024.2.0)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk->-r
requirements.txt (line 14)) (8.1.7)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk->-r
requirements.txt (line 14)) (1.3.2)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk->-r
requirements.txt (line 14)) (2023.12.25)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from nltk->-r
requirements.txt (line 14)) (4.66.2)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from datasets->-r
requirements.txt (line 15)) (1.24.4)
Collecting pyarrow>=15.0.0 (from datasets->-r requirements.txt (line 15))
Downloading pyarrow-19.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets->-r requirements.txt (line 15))
Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from datasets->-r
requirements.txt (line 15)) (1.5.3)
Collecting requests>=2.32.2 (from datasets->-r requirements.txt (line 15))
Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting tqdm (from nltk->-r requirements.txt (line 14))
Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.7/57.7 kB 51.4 MB/s eta 0:00:00
Collecting xxhash (from datasets->-r requirements.txt (line 15))
Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets->-r requirements.txt (line 15))
Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets->-r
requirements.txt (line 15)) (3.9.3)
Collecting huggingface-hub>=0.23.0 (from datasets->-r requirements.txt (line 15))
Downloading huggingface_hub-0.28.1-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from datasets->-r
requirements.txt (line 15)) (6.0.1)
Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from
aiohttp->datasets->-r requirements.txt (line 15)) (1.3.1)
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from
aiohttp->datasets->-r requirements.txt (line 15)) (23.2.0)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from
aiohttp->datasets->-r requirements.txt (line 15)) (1.4.1)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from
aiohttp->datasets->-r requirements.txt (line 15)) (6.0.5)
Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from
aiohttp->datasets->-r requirements.txt (line 15)) (1.9.4)
Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from
aiohttp->datasets->-r requirements.txt (line 15)) (4.0.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from
requests>=2.32.2->datasets->-r requirements.txt (line 15)) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from
requests>=2.32.2->datasets->-r requirements.txt (line 15)) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from
requests>=2.32.2->datasets->-r requirements.txt (line 15)) (1.26.18)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from
requests>=2.32.2->datasets->-r requirements.txt (line 15)) (2024.2.2)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from
jinja2->torch>=2.0.0->-r requirements.txt (line 5)) (2.1.5)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from
pandas->datasets->-r requirements.txt (line 15)) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from
pandas->datasets->-r requirements.txt (line 15)) (2024.1)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from
sympy->torch>=2.0.0->-r requirements.txt (line 5)) (1.3.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from
python-dateutil>=2.8.1->pandas->datasets->-r requirements.txt (line 15)) (1.16.0)
Downloading sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 26.0 MB/s eta 0:00:00
Downloading hydra_core-1.3.2-py3-none-any.whl (154 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 154.5/154.5 kB 53.7 MB/s eta 0:00:00
Downloading hydra_colorlog-1.2.0-py3-none-any.whl (3.6 kB)
Downloading nltk-3.9.1-py3-none-any.whl (1.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 30.8 MB/s eta 0:00:00
Downloading datasets-3.2.0-py3-none-any.whl (480 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 480.6/480.6 kB 48.9 MB/s eta 0:00:00
Downloading dill-0.3.8-py3-none-any.whl (116 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 52.0 MB/s eta 0:00:00
Downloading huggingface_hub-0.28.1-py3-none-any.whl (464 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 464.1/464.1 kB 47.9 MB/s eta 0:00:00
Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.8/134.8 kB 42.9 MB/s eta 0:00:00
Downloading omegaconf-2.3.0-py3-none-any.whl (79 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 79.5/79.5 kB 83.9 MB/s eta 0:00:00
Downloading pyarrow-19.0.0-cp310-cp310-manylinux_2_28_x86_64.whl (42.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.1/42.1 MB 77.1 MB/s eta 0:00:00
Downloading requests-2.32.3-py3-none-any.whl (64 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.9/64.9 kB 99.1 MB/s eta 0:00:00
Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 kB 92.9 MB/s eta 0:00:00
Downloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.1/194.1 kB 106.4 MB/s eta 0:00:00
Building wheels for collected packages: flash-attn, antlr4-python3-runtime
Building wheel for flash-attn (setup.py): started
Building wheel for flash-attn (setup.py): finished with status 'done'
Created wheel for flash-attn: filename=flash_attn-2.6.1-cp310-cp310-linux_x86_64.whl size=198444860
sha256=8da92ac0324e367e37327e6cdd621b9e0a02e72b6328c7ad8096ffa4a9c9b699
Stored in directory:
/tmp/pip-ephem-wheel-cache-7q3zl71t/wheels/91/6a/38/f0faa036b4ac73a73247386f1ab1bb4cb4f6e72e6861a779f1
Building wheel for antlr4-python3-runtime (setup.py): started
Building wheel for antlr4-python3-runtime (setup.py): finished with status 'done'
Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144554
sha256=4273d339d9c87ca721fe10ef7b3b6214c5908051c06c44c04fcaf6ac5e62e2fe
Stored in directory:
/tmp/pip-ephem-wheel-cache-7q3zl71t/wheels/12/93/dd/1f6a127edc45659556564c5730f6d4e300888f4bca2d4c5a88
Successfully built flash-attn antlr4-python3-runtime
Installing collected packages: sentencepiece, antlr4-python3-runtime, xxhash, tqdm, requests, pyarrow,
omegaconf, dill, colorlog, nltk, multiprocess, hydra-core, huggingface-hub, hydra_colorlog, flash-attn,
datasets
Attempting uninstall: tqdm
Found existing installation: tqdm 4.66.2
Uninstalling tqdm-4.66.2:
Successfully uninstalled tqdm-4.66.2
Attempting uninstall: requests
Found existing installation: requests 2.31.0
Uninstalling requests-2.31.0:
Successfully uninstalled requests-2.31.0
Attempting uninstall: pyarrow
Found existing installation: pyarrow 14.0.1
Uninstalling pyarrow-14.0.1:
Successfully uninstalled pyarrow-14.0.1
Attempting uninstall: flash-attn
Found existing installation: flash-attn 2.4.2
Uninstalling flash-attn-2.4.2:
Successfully uninstalled flash-attn-2.4.2
[91mERROR: pip's dependency resolver does not currently take into account all the packages that are
installed. This behaviour is the source of the following dependency conflicts.
torchtext 0.17.0a0 requires torch==2.3.0a0+6ddf5cf, but you have torch 2.3.0a0+6ddf5cf85e.nv24.4 which is
incompatible.
transformer-engine 1.5.0+6a9edc3 requires flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6, but you have flash-attn
2.6.1 which is incompatible.
[0m
Successfully installed antlr4-python3-runtime-4.9.3 colorlog-6.9.0 datasets-3.2.0 dill-0.3.8 flash-attn-2.6.1
huggingface-hub-0.28.1 hydra-core-1.3.2 hydra_colorlog-1.2.0 multiprocess-0.70.16 nltk-3.9.1 omegaconf-2.3.0
pyarrow-19.0.0 requests-2.32.3 sentencepiece-0.2.0 tqdm-4.67.1 xxhash-3.5.0
[91mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with
the system package manager. It is recommended to use a virtual environment instead:
https://pip.pypa.io/warnings/venv
[0m
[91m
A new release of pip is available: 24.0 -> 25.0
To update, run: python -m pip install --upgrade pip
[0m
---> Removed intermediate container 6fab7fef9d7a
---> 50c833fb458f
Step 29/36 : RUN git clone --depth 1 --branch dmc https://github.com/NVIDIA/Megatron-LM.git /app/Megatron-LM
&& cd /app/Megatron-LM && pip install -e .
---> Running in 17dbf511c5b3
[91mCloning into '/app/Megatron-LM'...
[0m
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///app/Megatron-LM
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Checking if build backend supports build_editable: started
Checking if build backend supports build_editable: finished with status 'done'
Getting requirements to build editable: started
Getting requirements to build editable: finished with status 'done'
Preparing editable metadata (pyproject.toml): started
Preparing editable metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from
megatron-core==0.10.0rc0) (2.3.0a0+6ddf5cf85e.nv24.4)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from
megatron-core==0.10.0rc0) (23.2)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (3.13.3)
Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (4.10.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (2.6.3)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (3.1.3)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (2024.2.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from
jinja2->torch->megatron-core==0.10.0rc0) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from
sympy->torch->megatron-core==0.10.0rc0) (1.3.0)
Building wheels for collected packages: megatron-core
Building editable for megatron-core (pyproject.toml): started
Building editable for megatron-core (pyproject.toml): finished with status 'done'
Created wheel for megatron-core: filename=megatron_core-0.10.0rc0-0.editable-cp310-cp310-linux_x86_64.whl
size=16959 sha256=8516aa6bed59711397451231d8c2f9cef657bf62e407c95f6882ca47ea72bafc
Stored in directory:
/tmp/pip-ephem-wheel-cache-demzk1ln/wheels/ac/13/e6/1ba2b5b3bea71b4ae468d84030e90cc700c9d586023e483a1e
Successfully built megatron-core
Installing collected packages: megatron-core
Successfully installed megatron-core-0.10.0rc0
[91mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with
the system package manager. It is recommended to use a virtual environment instead:
https://pip.pypa.io/warnings/venv
[0m
[91m
A new release of pip is available: 24.0 -> 25.0
To update, run: python -m pip install --upgrade pip
[0m
---> Removed intermediate container 17dbf511c5b3
---> 25e53ff6551a
Step 30/36 : ENV PYTHONPATH=/app/Megatron-LM:$PYTHONPATH
---> Running in a5db5d8f5dca
---> Removed intermediate container a5db5d8f5dca
---> 6e156f776e82
Step 31/36 : RUN mkdir -p /dev/shm && mkdir -p /tmp/pytorch_extensions && chmod -R 777 /dev/shm
/tmp/pytorch_extensions
---> Running in b076d3e86060
---> Removed intermediate container b076d3e86060
---> f1ee3c75ba35
Step 32/36 : RUN printf '#!/bin/bash\n export PYTHONPATH=/app/workspace:/app/Megatron-LM:$PYTHONPATH\n
exec python3 -W ignore main.py "$@"\n' > /app/workspace/entrypoint.sh && chmod +x
/app/workspace/entrypoint.sh
---> Running in 3795c50ecd75
---> Removed intermediate container 3795c50ecd75
---> a613e904cbac
Step 33/36 : RUN ls -la /app/workspace && ls -la /app/workspace/entrypoint.sh && test -x
/app/workspace/entrypoint.sh
---> Running in 0933e5a094c2
total 48
drwxr-xr-x 1 root root 4096 Feb 4 20:02 .
drwxr-xr-x 1 root root 4096 Feb 4 20:02 ..
-rw-rw-r-- 1 root root 42 Feb 4 19:54 dyana-requirements-gpu.txt
-rw-rw-r-- 1 root root 29 Feb 4 19:54 dyana-requirements.txt
-rw-rw-r-- 1 root root 6886 Feb 4 19:54 dyana.py
-rwxr-xr-x 1 root root 118 Feb 4 20:02 entrypoint.sh
-rw-rw-r-- 1 root root 9170 Feb 4 17:14 main.py
-rw-rw-r-- 1 root root 365 Feb 4 13:52 requirements.txt
-rw-rw-r-- 1 root root 594 Feb 4 15:24 verify.py
-rwxr-xr-x 1 root root 118 Feb 4 20:02 /app/workspace/entrypoint.sh
---> Removed intermediate container 0933e5a094c2
---> b7dc697abe44
Step 34/36 : RUN chown -R root:root /app && chmod -R 755 /app && chmod +x
/app/workspace/entrypoint.sh
---> Running in fafa75f3ccc7
---> Removed intermediate container fafa75f3ccc7
---> 320dd2f144e3
Step 35/36 : SHELL ["/bin/bash", "-c"]
---> Running in 3749fe8f8356
---> Removed intermediate container 3749fe8f8356
---> b98488fdbb4a
Step 36/36 : ENTRYPOINT ["/bin/bash", "-c", "exec /app/workspace/entrypoint.sh \"$@\""]
---> Running in 5fbd07eeabec
---> Removed intermediate container 5fbd07eeabec
---> 448c58e5c18c
Successfully built 448c58e5c18c
Successfully tagged dyana-megatron-loader:latest
👁️🗨️ tracer: initializing ...
👁️🗨️ tracer: started ...
🍿 loader: warning: allowing bridged network access to the container
🍿 loader: executing with arguments ['--model', '/model_optim_rng.pt', '--tokenizer',
'/llama-2-7b-tokenizer.model', '--size', '7B', '--input', 'This is an example prompt.'] ...
👁️🗨️ tracer: stopping ...
🗃 saving 3797 events to trace.json
Platform : Linux-6.8.0-52-generic-x86_64-with-glibc2.35
Loader : megatron
Arguments : --model /model_optim_rng.pt --tokenizer /llama-2-7b-tokenizer.model --size 7B --input This
is an example prompt.
Volumes : /model_optim_rng.pt (/home/ads/model_optim_rng.pt), /llama-2-7b-tokenizer.model
(/home/ads/llama-2-7b-tokenizer.model)
Started at : 2025-02-04T13:03:00.501217
Ended at : 2025-02-04T13:03:04.317836
Total Events : 3797
Stdout : [W init.cpp:767] Warning: nvfuser is no longer supported in torch script, use
_jit_set_nvfuser_enabled is deprecated and a no-op (function operator())
RAM Usage:
* start : 362.9MiB
* cuda_initialized : 475.8MiB 🔺 112.9MiB
* end : 475.8MiB
* end : 475.8MiB
Disk Usage:
* start : 1.4TiB
* cuda_initialized : 1.4TiB 🔺 140.0KiB
* end : 1.4TiB 🔺 212.0KiB
* end : 1.4TiB 🔺 184.0KiB
Process Executions:
* 1 python3 -> execve /usr/bin/python3 ['python3', '-W', 'ignore', 'main.py', '/model_optim_rng.pt',
'--tokenizer', '/llama-2-7b-tokenizer.model', '--size', '7B', '--input', 'This is an example prompt.']
* 10 hostname -> execve /usr/bin/hostname ['hostname']
* 19 touch -> execve /usr/bin/touch ['touch', '/usr/local/cuda/compat/.570.86.10.3b0f18805def.checked']
* 20 rm -> execve /usr/bin/rm ['rm', '-f', '/usr/local/cuda/compat/lib']
* 30 hostname -> execve /usr/bin/hostname ['hostname']
* 9 sed -> execve /usr/bin/sed ['sed', 's/^$/unknown/']
* 8 sed -> execve /usr/bin/sed ['sed', '-n', 's/^NVRM.*Kernel Module\\( for *\\| \\) *\\([^()
]*\\).*$/\\2/p', '/proc/driver/nvidia/version']
* 13 grep -> execve /usr/bin/grep ['grep', '^CUDA Version']
* 14 sed -> execve /usr/bin/sed ['sed', 's/^.*: //']
* 12 nvidia-smi -> execve /usr/bin/nvidia-smi ['nvidia-smi', '-q', '-d', 'COMPUTE']
* 18 cut -> execve /usr/bin/cut ['cut', '-d', '.', '-f', '1-2']
* 22 timeout -> execve /usr/bin/timeout ['timeout', '-s', 'KILL', '35', '/usr/local/bin/cudaCheck']
* 23 cudaCheck -> execve /usr/local/bin/cudaCheck ['/usr/local/bin/cudaCheck']
* 26 cat -> execve /usr/bin/cat ['cat', '/sys/module/mlx5_core/version']
* 28 sed -> execve /usr/bin/sed ['sed', '-n', 's/^NVRM.*Kernel Module\\( for *\\| \\) *\\([^()
]*\\).*$/\\2/p', '/proc/driver/nvidia/version']
* 29 sed -> execve /usr/bin/sed ['sed', 's/^$/unknown/']
* 33 grep -> execve /usr/bin/grep ['grep', '^CUDA Version']
* 34 sed -> execve /usr/bin/sed ['sed', 's/^.*: //']
* 32 nvidia-smi -> execve /usr/bin/nvidia-smi ['nvidia-smi', '-q', '-d', 'COMPUTE']
* 38 cut -> execve /usr/bin/cut ['cut', '-d', '.', '-f', '1-2']
* 40 cat -> execve /usr/bin/cat ['cat', '/sys/module/mlx5_core/version']
Network Usage:
eth0
start : rx=1.9KiB tx=0.0B
cuda_initialized : rx=1.9KiB tx=0.0B
end : rx=1.9KiB tx=0.0B
end : rx=1.9KiB tx=0.0B
Network Activity:
* [23] cudaCheck -> connect /tmp/nvidia-mps/control
* [1] fuse -> connect /var/run/nscd/socket
* [1] fuse -> connect /tmp/ucx-vfs-root.sock
* [1] python3 -> connect /tmp/nvidia-mps/control
File Accesses:
* /app/Megatron-LM
* /app/workspace
* /app/workspace/__pycache__/dyana.cpython-310.pyc.135872031299872
* /app/workspace/dyana.py
* /app/workspace/entrypoint.sh
* /app/workspace/main.py
* /proc
* /usr/bin/cat
* /usr/bin/cut
* /usr/bin/grep
* /usr/bin/hostname
* /usr/bin/nvidia-smi
* /usr/bin/python3
* /usr/bin/rm
* /usr/bin/sed
* /usr/bin/timeout
* /usr/bin/touch
* /usr/local/bin/cudaCheck
* /usr/local/cuda/compat/.570.86.10.3b0f18805def.checked
* /usr/local/cuda/compat/lib.real/libcuda.so.1
* /usr/local/cuda/lib64/libcublas.so.12
* /usr/local/cuda/lib64/libcublasLt.so.12
* /usr/local/cuda/lib64/libcudart.so.12
* /usr/local/cuda/lib64/libcufft.so.11
* /usr/local/cuda/lib64/libcupti.so.12
* /usr/local/cuda/lib64/libcurand.so.10
* /usr/local/cuda/lib64/libcusparse.so.12
* /usr/local/cuda/lib64/libnvJitLink.so.12
* /usr/local/cuda/lib64/libnvToolsExt.so.1
* 2075 accesses to /usr/local/lib/*
* 1114 accesses to /usr/lib/*
* 18 accesses to /lib/*
* 102 accesses to /dev/*
* 146 accesses to /proc/*
* 45 accesses to /sys/*
* 90 accesses to /etc/*
* 1 accesses to /tmp/*
Security Events:
* Dynamic code loading detected (defense-evasion, moderate severity) Total Events : 3797
Stdout : [W init.cpp:767] Warning: nvfuser is no longer supported in torch script, use
_jit_set_nvfuser_enabled is deprecated and a no-op (function operator()) i think you are correct and most likely (although removed in my last test) the culprit is |
i think i may be getting somewhere @evilsocket ! 😁 🤞 Click to expandTotal reclaimed space: 20.91GB
🐳 loader: initializing loader megatron
Step 1/40 : FROM nvcr.io/nvidia/pytorch:24.04-py3
---> 3f0b23af1f4f
Step 2/40 : WORKDIR /app
---> Running in ab0a4560b9a7
---> Removed intermediate container ab0a4560b9a7
---> 0373988ab07f
Step 3/40 : RUN apt-get update && apt-get install -y --no-install-recommends git ca-certificates
build-essential && rm -rf /var/lib/apt/lists/*
---> Running in bc8c88d29bb0
Get:1 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy InRelease [270 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:4 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Get:5 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [3606 kB]
Get:6 http://security.ubuntu.com/ubuntu jammy-security/multiverse amd64 Packages [45.2 kB]
Get:7 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [1229 kB]
Get:8 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [2606 kB]
Get:9 http://archive.ubuntu.com/ubuntu jammy/restricted amd64 Packages [164 kB]
Get:10 http://archive.ubuntu.com/ubuntu jammy/universe amd64 Packages [17.5 MB]
Get:11 http://archive.ubuntu.com/ubuntu jammy/multiverse amd64 Packages [266 kB]
Get:12 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages [1792 kB]
Get:13 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [2906 kB]
Get:14 http://archive.ubuntu.com/ubuntu jammy-updates/restricted amd64 Packages [3742 kB]
Get:15 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1522 kB]
Get:16 http://archive.ubuntu.com/ubuntu jammy-updates/multiverse amd64 Packages [53.3 kB]
Get:17 http://archive.ubuntu.com/ubuntu jammy-backports/main amd64 Packages [81.4 kB]
Get:18 http://archive.ubuntu.com/ubuntu jammy-backports/universe amd64 Packages [35.2 kB]
Fetched 36.2 MB in 3s (10.8 MB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
build-essential is already the newest version (12.9ubuntu3).
Suggested packages:
gettext-base git-daemon-run | git-daemon-sysvinit git-doc git-email git-gui
gitk gitweb git-cvs git-mediawiki git-svn
The following packages will be upgraded:
ca-certificates git
2 upgraded, 0 newly installed, 0 to remove and 67 not upgraded.
Need to get 3328 kB of archives.
After this operation, 29.7 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 ca-certificates all 20240203~22.04.1 [162 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 git amd64 1:2.34.1-1ubuntu1.12 [3165 kB]
[91mdebconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
[0m
[91mdebconf: unable to initialize frontend: Readline
[0m
[91mdebconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
[0m
[91mdpkg-preconfigure: unable to re-open stdin:
[0m
Fetched 3328 kB in 2s (2118 kB/s)
(Reading database ...
(Reading database ... 5%(Reading database ... 10%
(Reading database ... 15%(Reading database ... 20%(Reading database ... 25%(Reading database ... 30%
(Reading database ... 35%(Reading database ... 40%(Reading database ... 45%
(Reading database ... 50%(Reading database ... 55%(Reading database ... 60%
(Reading database ... 65%
(Reading database ... 70%
(Reading database ... 75%
(Reading database ... 80%
(Reading database ... 85%
(Reading database ... 90%
(Reading database ... 95%
(Reading database ... 100%(Reading database ... 23388 files and directories currently installed.)
Preparing to unpack .../ca-certificates_20240203~22.04.1_all.deb ...
Unpacking ca-certificates (20240203~22.04.1) over (20230311ubuntu0.22.04.1) ...
Preparing to unpack .../git_1%3a2.34.1-1ubuntu1.12_amd64.deb ...
Unpacking git (1:2.34.1-1ubuntu1.12) over (1:2.34.1-1ubuntu1.10) ...
Setting up ca-certificates (20240203~22.04.1) ...
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
Updating certificates in /etc/ssl/certs...
rehash: warning: skipping ca-certificates.crt,it does not contain exactly one certificate or CRL
14 added, 5 removed; done.
Setting up git (1:2.34.1-1ubuntu1.12) ...
Processing triggers for ca-certificates (20240203~22.04.1) ...
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
---> Removed intermediate container bc8c88d29bb0
---> 43b26c300f4f
Step 4/40 : ENV CUDA_HOME=/usr/local/cuda
---> Running in 11b4840272c5
---> Removed intermediate container 11b4840272c5
---> a5d06c6d2656
Step 5/40 : ENV PATH=/usr/local/cuda/bin:$PATH
---> Running in e009e74e3bd8
---> Removed intermediate container e009e74e3bd8
---> 0bbc05b3d9ce
Step 6/40 : ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
---> Running in 96811fb735f5
---> Removed intermediate container 96811fb735f5
---> c5d64482b550
Step 7/40 : ENV CUDA_LAUNCH_BLOCKING=1
---> Running in 3f52d1450414
---> Removed intermediate container 3f52d1450414
---> 96bc2bf962c5
Step 8/40 : ENV PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32
---> Running in 32e4bcda6b19
---> Removed intermediate container 32e4bcda6b19
---> 6863fb89dbf4
Step 9/40 : ENV CUDA_MODULE_LOADING=LAZY
---> Running in 51aa0621d0b4
---> Removed intermediate container 51aa0621d0b4
---> e200ae56f23e
Step 10/40 : ENV TORCH_USE_CUDA_DSA=0
---> Running in 6b6a6afab154
---> Removed intermediate container 6b6a6afab154
---> 191b0b01e70f
Step 11/40 : ENV CUDA_DEVICE_MAX_CONNECTIONS=1
---> Running in 9259e2225be9
---> Removed intermediate container 9259e2225be9
---> 4f4cd5299bf9
Step 12/40 : ENV NCCL_ASYNC_ERROR_HANDLING=1
---> Running in cf46e48d1d01
---> Removed intermediate container cf46e48d1d01
---> f2bac12b1aa7
Step 13/40 : ENV OMP_NUM_THREADS=1
---> Running in 00f810cb38cc
---> Removed intermediate container 00f810cb38cc
---> e07237f4989e
Step 14/40 : ENV NVTE_FRAMEWORK=pytorch
---> Running in 8d4fb84a44b7
---> Removed intermediate container 8d4fb84a44b7
---> 0ae617758e61
Step 15/40 : ENV MAX_JOBS=4
---> Running in 1b604d30afd5
---> Removed intermediate container 1b604d30afd5
---> a63b3536ff38
Step 16/40 : ENV DEBIAN_FRONTEND=noninteractive
---> Running in 30b7de6c53cf
---> Removed intermediate container 30b7de6c53cf
---> 22436873eed5
Step 17/40 : ENV TORCH_CUDNN_V8_API_ENABLED=1
---> Running in 1bb49e7718b4
---> Removed intermediate container 1bb49e7718b4
---> eca9e78aaffa
Step 18/40 : ENV TORCH_ALLOW_TF32=1
---> Running in 6a9882cdd5a0
---> Removed intermediate container 6a9882cdd5a0
---> a7ce4496d174
Step 19/40 : ENV TORCH_SHOW_CPP_STACKTRACES=0
---> Running in 52da5ac27fed
---> Removed intermediate container 52da5ac27fed
---> 627054ed1c1a
Step 20/40 : ENV PYTHONWARNINGS=ignore
---> Running in c505e753b179
---> Removed intermediate container c505e753b179
---> 0f2e80cb0b2b
Step 21/40 : ENV NVIDIA_VISIBLE_DEVICES="all"
---> Running in a3df96478c05
---> Removed intermediate container a3df96478c05
---> f2fe82ed8e68
Step 22/40 : ENV CUDA_DEVICE_ORDER=PCI_BUS_ID
---> Running in e6fd9f1591a6
---> Removed intermediate container e6fd9f1591a6
---> 4bf4c11db134
Step 23/40 : ENV TORCH_USE_CUDA_DSA=1
---> Running in 4fb16cf7befd
---> Removed intermediate container 4fb16cf7befd
---> eb889a0dd8c6
Step 24/40 : ENV PYTORCH_JIT=0
---> Running in 75ead9479a19
---> Removed intermediate container 75ead9479a19
---> 5c39d6971b47
Step 25/40 : ENV TORCH_INDUCTOR_DISABLE_CUDA_GRAPH=0
---> Running in c4dedf771825
---> Removed intermediate container c4dedf771825
---> 35211500224d
Step 26/40 : RUN python3 -c "import torch; print(f'PyTorch version: {torch.__version__}')"
---> Running in 32e22e00cfe8
PyTorch version: 2.3.0a0+6ddf5cf85e.nv24.04
---> Removed intermediate container 32e22e00cfe8
---> fd0bff8aca3c
Step 27/40 : RUN mkdir -p /app/workspace
---> Running in 812d60f9c92d
---> Removed intermediate container 812d60f9c92d
---> c2ef5e9fd1b5
Step 28/40 : COPY requirements.txt /app/workspace/
---> 962e1f5277b1
Step 29/40 : COPY *.py /app/workspace/
---> edca5df6533a
Step 30/40 : COPY dyana-requirements*.txt /app/workspace/
---> b46095479ae6
Step 31/40 : WORKDIR /app/workspace
---> Running in 7239eb24417b
---> Removed intermediate container 7239eb24417b
---> 986068158185
Step 32/40 : RUN pip install --no-cache-dir -r requirements.txt
---> Running in 0723a5c5a5a8
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com,
https://download.pytorch.org/whl/cu121
Looking in links: https://developer.download.nvidia.com/compute/redist
Requirement already satisfied: torch>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from -r
requirements.txt (line 5)) (2.3.0a0+6ddf5cf85e.nv24.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from -r
requirements.txt (line 6)) (23.2)
Requirement already satisfied: typing_extensions>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from -r
requirements.txt (line 7)) (4.10.0)
Collecting flash-attn==2.6.1 (from -r requirements.txt (line 10))
Downloading flash_attn-2.6.1.tar.gz (2.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.6/2.6 MB 4.7 MB/s eta 0:00:00
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Collecting sentencepiece==0.2.0 (from -r requirements.txt (line 11))
Downloading sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting hydra-core==1.3.2 (from -r requirements.txt (line 12))
Downloading hydra_core-1.3.2-py3-none-any.whl.metadata (5.5 kB)
Collecting hydra_colorlog==1.2.0 (from -r requirements.txt (line 13))
Downloading hydra_colorlog-1.2.0-py3-none-any.whl.metadata (949 bytes)
Collecting nltk (from -r requirements.txt (line 14))
Downloading nltk-3.9.1-py3-none-any.whl.metadata (2.9 kB)
Collecting datasets (from -r requirements.txt (line 15))
Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Requirement already satisfied: psutil>=5.6.7 in /usr/local/lib/python3.10/dist-packages (from -r
requirements.txt (line 18)) (5.9.4)
Requirement already satisfied: einops in /usr/local/lib/python3.10/dist-packages (from flash-attn==2.6.1->-r
requirements.txt (line 10)) (0.7.0)
Collecting omegaconf<2.4,>=2.2 (from hydra-core==1.3.2->-r requirements.txt (line 12))
Downloading omegaconf-2.3.0-py3-none-any.whl.metadata (3.9 kB)
Collecting antlr4-python3-runtime==4.9.* (from hydra-core==1.3.2->-r requirements.txt (line 12))
Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 117.0/117.0 kB 18.0 MB/s eta 0:00:00
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Collecting colorlog (from hydra_colorlog==1.2.0->-r requirements.txt (line 13))
Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (3.13.3)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (2.6.3)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (3.1.3)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (2024.2.0)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk->-r
requirements.txt (line 14)) (8.1.7)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk->-r
requirements.txt (line 14)) (1.3.2)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk->-r
requirements.txt (line 14)) (2023.12.25)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from nltk->-r
requirements.txt (line 14)) (4.66.2)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from datasets->-r
requirements.txt (line 15)) (1.24.4)
Collecting pyarrow>=15.0.0 (from datasets->-r requirements.txt (line 15))
Downloading pyarrow-19.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets->-r requirements.txt (line 15))
Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from datasets->-r
requirements.txt (line 15)) (1.5.3)
Collecting requests>=2.32.2 (from datasets->-r requirements.txt (line 15))
Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting tqdm (from nltk->-r requirements.txt (line 14))
Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.7/57.7 kB 17.7 MB/s eta 0:00:00
Collecting xxhash (from datasets->-r requirements.txt (line 15))
Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets->-r requirements.txt (line 15))
Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets->-r
requirements.txt (line 15)) (3.9.3)
Collecting huggingface-hub>=0.23.0 (from datasets->-r requirements.txt (line 15))
Downloading huggingface_hub-0.28.1-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from datasets->-r
requirements.txt (line 15)) (6.0.1)
Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from
aiohttp->datasets->-r requirements.txt (line 15)) (1.3.1)
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from
aiohttp->datasets->-r requirements.txt (line 15)) (23.2.0)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from
aiohttp->datasets->-r requirements.txt (line 15)) (1.4.1)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from
aiohttp->datasets->-r requirements.txt (line 15)) (6.0.5)
Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from
aiohttp->datasets->-r requirements.txt (line 15)) (1.9.4)
Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from
aiohttp->datasets->-r requirements.txt (line 15)) (4.0.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from
requests>=2.32.2->datasets->-r requirements.txt (line 15)) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from
requests>=2.32.2->datasets->-r requirements.txt (line 15)) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from
requests>=2.32.2->datasets->-r requirements.txt (line 15)) (1.26.18)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from
requests>=2.32.2->datasets->-r requirements.txt (line 15)) (2024.2.2)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from
jinja2->torch>=2.0.0->-r requirements.txt (line 5)) (2.1.5)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from
pandas->datasets->-r requirements.txt (line 15)) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from
pandas->datasets->-r requirements.txt (line 15)) (2024.1)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from
sympy->torch>=2.0.0->-r requirements.txt (line 5)) (1.3.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from
python-dateutil>=2.8.1->pandas->datasets->-r requirements.txt (line 15)) (1.16.0)
Downloading sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 7.9 MB/s eta 0:00:00
Downloading hydra_core-1.3.2-py3-none-any.whl (154 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 154.5/154.5 kB 16.1 MB/s eta 0:00:00
Downloading hydra_colorlog-1.2.0-py3-none-any.whl (3.6 kB)
Downloading nltk-3.9.1-py3-none-any.whl (1.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 10.6 MB/s eta 0:00:00
Downloading datasets-3.2.0-py3-none-any.whl (480 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 480.6/480.6 kB 13.8 MB/s eta 0:00:00
Downloading dill-0.3.8-py3-none-any.whl (116 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 25.2 MB/s eta 0:00:00
Downloading huggingface_hub-0.28.1-py3-none-any.whl (464 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 464.1/464.1 kB 14.2 MB/s eta 0:00:00
Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.8/134.8 kB 30.9 MB/s eta 0:00:00
Downloading omegaconf-2.3.0-py3-none-any.whl (79 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 79.5/79.5 kB 41.9 MB/s eta 0:00:00
Downloading pyarrow-19.0.0-cp310-cp310-manylinux_2_28_x86_64.whl (42.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.1/42.1 MB 70.8 MB/s eta 0:00:00
Downloading requests-2.32.3-py3-none-any.whl (64 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.9/64.9 kB 97.4 MB/s eta 0:00:00
Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 kB 99.8 MB/s eta 0:00:00
Downloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.1/194.1 kB 88.5 MB/s eta 0:00:00
Building wheels for collected packages: flash-attn, antlr4-python3-runtime
Building wheel for flash-attn (setup.py): started
Building wheel for flash-attn (setup.py): finished with status 'done'
Created wheel for flash-attn: filename=flash_attn-2.6.1-cp310-cp310-linux_x86_64.whl size=198444860
sha256=8da92ac0324e367e37327e6cdd621b9e0a02e72b6328c7ad8096ffa4a9c9b699
Stored in directory:
/tmp/pip-ephem-wheel-cache-rwhgfy07/wheels/91/6a/38/f0faa036b4ac73a73247386f1ab1bb4cb4f6e72e6861a779f1
Building wheel for antlr4-python3-runtime (setup.py): started
Building wheel for antlr4-python3-runtime (setup.py): finished with status 'done'
Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144554
sha256=ef8554c499850c5455842b54b6211575cadadfc21c719b36a173dc9963506562
Stored in directory:
/tmp/pip-ephem-wheel-cache-rwhgfy07/wheels/12/93/dd/1f6a127edc45659556564c5730f6d4e300888f4bca2d4c5a88
Successfully built flash-attn antlr4-python3-runtime
Installing collected packages: sentencepiece, antlr4-python3-runtime, xxhash, tqdm, requests, pyarrow,
omegaconf, dill, colorlog, nltk, multiprocess, hydra-core, huggingface-hub, hydra_colorlog, flash-attn,
datasets
Attempting uninstall: tqdm
Found existing installation: tqdm 4.66.2
Uninstalling tqdm-4.66.2:
Successfully uninstalled tqdm-4.66.2
Attempting uninstall: requests
Found existing installation: requests 2.31.0
Uninstalling requests-2.31.0:
Successfully uninstalled requests-2.31.0
Attempting uninstall: pyarrow
Found existing installation: pyarrow 14.0.1
Uninstalling pyarrow-14.0.1:
Successfully uninstalled pyarrow-14.0.1
Attempting uninstall: flash-attn
Found existing installation: flash-attn 2.4.2
Uninstalling flash-attn-2.4.2:
Successfully uninstalled flash-attn-2.4.2
[91mERROR: pip's dependency resolver does not currently take into account all the packages that are
installed. This behaviour is the source of the following dependency conflicts.
torchtext 0.17.0a0 requires torch==2.3.0a0+6ddf5cf, but you have torch 2.3.0a0+6ddf5cf85e.nv24.4 which is
incompatible.
transformer-engine 1.5.0+6a9edc3 requires flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6, but you have flash-attn
2.6.1 which is incompatible.
[0m
Successfully installed antlr4-python3-runtime-4.9.3 colorlog-6.9.0 datasets-3.2.0 dill-0.3.8 flash-attn-2.6.1
huggingface-hub-0.28.1 hydra-core-1.3.2 hydra_colorlog-1.2.0 multiprocess-0.70.16 nltk-3.9.1 omegaconf-2.3.0
pyarrow-19.0.0 requests-2.32.3 sentencepiece-0.2.0 tqdm-4.67.1 xxhash-3.5.0
[91mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with
the system package manager. It is recommended to use a virtual environment instead:
https://pip.pypa.io/warnings/venv
[0m
[91m
A new release of pip is available: 24.0 -> 25.0
To update, run: python -m pip install --upgrade pip
[0m
---> Removed intermediate container 0723a5c5a5a8
---> 5dbb48c847a9
Step 33/40 : RUN git clone --depth 1 --branch dmc https://github.com/NVIDIA/Megatron-LM.git /app/Megatron-LM
&& cd /app/Megatron-LM && pip install -e .
---> Running in b1e3fa7c880c
[91mCloning into '/app/Megatron-LM'...
[0m
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///app/Megatron-LM
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Checking if build backend supports build_editable: started
Checking if build backend supports build_editable: finished with status 'done'
Getting requirements to build editable: started
Getting requirements to build editable: finished with status 'done'
Preparing editable metadata (pyproject.toml): started
Preparing editable metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from
megatron-core==0.10.0rc0) (2.3.0a0+6ddf5cf85e.nv24.4)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from
megatron-core==0.10.0rc0) (23.2)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (3.13.3)
Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (4.10.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (2.6.3)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (3.1.3)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (2024.2.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from
jinja2->torch->megatron-core==0.10.0rc0) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from
sympy->torch->megatron-core==0.10.0rc0) (1.3.0)
Building wheels for collected packages: megatron-core
Building editable for megatron-core (pyproject.toml): started
Building editable for megatron-core (pyproject.toml): finished with status 'done'
Created wheel for megatron-core: filename=megatron_core-0.10.0rc0-0.editable-cp310-cp310-linux_x86_64.whl
size=16959 sha256=63b747c8d76598c153d0e12e9ca98d74cde5d5afdca4b8432bf766ff06cf2209
Stored in directory:
/tmp/pip-ephem-wheel-cache-v8caijc3/wheels/ac/13/e6/1ba2b5b3bea71b4ae468d84030e90cc700c9d586023e483a1e
Successfully built megatron-core
Installing collected packages: megatron-core
Successfully installed megatron-core-0.10.0rc0
[91mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with
the system package manager. It is recommended to use a virtual environment instead:
https://pip.pypa.io/warnings/venv
[0m
[91m
A new release of pip is available: 24.0 -> 25.0
To update, run: python -m pip install --upgrade pip
[0m
---> Removed intermediate container b1e3fa7c880c
---> 6e5f565269af
Step 34/40 : ENV PYTHONPATH=/app/Megatron-LM:$PYTHONPATH
---> Running in 5559cdcc301d
---> Removed intermediate container 5559cdcc301d
---> ba878e128422
Step 35/40 : RUN mkdir -p /dev/shm && mkdir -p /tmp/pytorch_extensions && chmod -R 777 /dev/shm
/tmp/pytorch_extensions
---> Running in c90b5a2cfb75
---> Removed intermediate container c90b5a2cfb75
---> 25260a483825
Step 36/40 : RUN printf '#!/bin/bash\npython3 -c "import torch; assert torch.cuda.is_available(), \"CUDA is
not available\"; device=torch.cuda.get_device_name(); print(f\"CUDA OK: {device}\")" && export
PYTHONPATH=/app/workspace:/app/Megatron-LM:$PYTHONPATH && exec python3 -W ignore main.py "$@"' >
/app/workspace/entrypoint.sh && chmod +x /app/workspace/entrypoint.sh
---> Running in 5d3465f15217
---> Removed intermediate container 5d3465f15217
---> 53c7c4c72420
Step 37/40 : RUN ls -la /app/workspace && ls -la /app/workspace/entrypoint.sh && test -x
/app/workspace/entrypoint.sh
---> Running in 4375cd60158b
total 48
drwxr-xr-x 1 root root 4096 Feb 4 21:37 .
drwxr-xr-x 1 root root 4096 Feb 4 21:37 ..
-rw-rw-r-- 1 root root 42 Feb 4 21:30 dyana-requirements-gpu.txt
-rw-rw-r-- 1 root root 29 Feb 4 21:30 dyana-requirements.txt
-rw-rw-r-- 1 root root 6886 Feb 4 21:30 dyana.py
-rwxr-xr-x 1 root root 266 Feb 4 21:37 entrypoint.sh
-rw-rw-r-- 1 root root 9314 Feb 4 21:29 main.py
-rw-rw-r-- 1 root root 365 Feb 4 13:52 requirements.txt
-rw-rw-r-- 1 root root 594 Feb 4 15:24 verify.py
-rwxr-xr-x 1 root root 266 Feb 4 21:37 /app/workspace/entrypoint.sh
---> Removed intermediate container 4375cd60158b
---> c2844872cac8
Step 38/40 : RUN chown -R root:root /app && chmod -R 755 /app && chmod +x
/app/workspace/entrypoint.sh
---> Running in 340415a1d003
---> Removed intermediate container 340415a1d003
---> 1c11315b91bf
Step 39/40 : SHELL ["/bin/bash", "-c"]
---> Running in b48c55d0cb17
---> Removed intermediate container b48c55d0cb17
---> a85e04a9c365
Step 40/40 : ENTRYPOINT ["/bin/bash", "-c", "exec /app/workspace/entrypoint.sh \"$@\""]
---> Running in dcc720c9a994
---> Removed intermediate container dcc720c9a994
---> 5c6caceee425
Successfully built 5c6caceee425
Successfully tagged dyana-megatron-loader:latest
👁️🗨️ tracer: initializing ...
👁️🗨️ tracer: started ...
🍿 loader: warning: allowing bridged network access to the container
🍿 loader: executing with arguments ['--model', '/model_optim_rng.pt', '--tokenizer',
'/llama-2-7b-tokenizer.model', '--size', '7B', '--input', 'This is an example prompt.'] ...
👁️🗨️ tracer: stopping ...
🗃 saving 6014 events to trace.json
Platform : Linux-6.8.0-52-generic-x86_64-with-glibc2.35
Loader : megatron
Arguments : --model /model_optim_rng.pt --tokenizer /llama-2-7b-tokenizer.model --size 7B --input This
is an example prompt.
Volumes : /model_optim_rng.pt (/home/ads/model_optim_rng.pt), /llama-2-7b-tokenizer.model
(/home/ads/llama-2-7b-tokenizer.model)
Started at : 2025-02-04T14:38:23.384119
Ended at : 2025-02-04T14:38:29.238120
Total Events : 6014
RAM Usage:
* start : 381.1MiB
* cuda_initialized : 487.2MiB 🔺 106.1MiB
* end : 487.2MiB
* end : 487.2MiB
GPU Usage:
NVIDIA A100 80GB PCIe | 79.3GiB
* start : 422.0MiB
* cuda_initialized : 498.0MiB 🔺 76.0MiB
* end : 498.0MiB
* end : 498.0MiB
NVIDIA A100 80GB PCIe | 79.3GiB
* start : 422.0MiB
* cuda_initialized : 422.0MiB
* end : 422.0MiB
* end : 422.0MiB
Disk Usage:
* start : 1.5TiB
* cuda_initialized : 1.5TiB 🔺 120.0KiB
* end : 1.5TiB 🔺 196.0KiB
* end : 1.5TiB 🔺 236.0KiB
Process Executions:
* 1 python3 -> execve /usr/bin/python3 ['python3', '-W', 'ignore', 'main.py', '/model_optim_rng.pt',
'--tokenizer', '/llama-2-7b-tokenizer.model', '--size', '7B', '--input', 'This is an example prompt.']
* 10 hostname -> execve /usr/bin/hostname ['hostname']
* 19 touch -> execve /usr/bin/touch ['touch', '/usr/local/cuda/compat/.570.86.10.76bc0a63b927.checked']
* 20 rm -> execve /usr/bin/rm ['rm', '-f', '/usr/local/cuda/compat/lib']
* 30 hostname -> execve /usr/bin/hostname ['hostname']
* 41 python3 -> execve /usr/bin/python3 ['python3', '-c', 'import torch; assert torch.cuda.is_available(),
CUDA', 'is', 'not', 'available; device=torch.cuda.get_device_name(); print(fCUDA', 'OK:', '{device})']
* 9 sed -> execve /usr/bin/sed ['sed', 's/^$/unknown/']
* 8 sed -> execve /usr/bin/sed ['sed', '-n', 's/^NVRM.*Kernel Module\\( for *\\| \\) *\\([^()
]*\\).*$/\\2/p', '/proc/driver/nvidia/version']
* 12 nvidia-smi -> execve /usr/bin/nvidia-smi ['nvidia-smi', '-q', '-d', 'COMPUTE']
* 14 sed -> execve /usr/bin/sed ['sed', 's/^.*: //']
* 13 grep -> execve /usr/bin/grep ['grep', '^CUDA Version']
* 18 cut -> execve /usr/bin/cut ['cut', '-d', '.', '-f', '1-2']
* 22 timeout -> execve /usr/bin/timeout ['timeout', '-s', 'KILL', '35', '/usr/local/bin/cudaCheck']
* 23 cudaCheck -> execve /usr/local/bin/cudaCheck ['/usr/local/bin/cudaCheck']
* 26 cat -> execve /usr/bin/cat ['cat', '/sys/module/mlx5_core/version']
* 28 sed -> execve /usr/bin/sed ['sed', '-n', 's/^NVRM.*Kernel Module\\( for *\\| \\) *\\([^()
]*\\).*$/\\2/p', '/proc/driver/nvidia/version']
* 29 sed -> execve /usr/bin/sed ['sed', 's/^$/unknown/']
* 34 sed -> execve /usr/bin/sed ['sed', 's/^.*: //']
* 32 nvidia-smi -> execve /usr/bin/nvidia-smi ['nvidia-smi', '-q', '-d', 'COMPUTE']
* 33 grep -> execve /usr/bin/grep ['grep', '^CUDA Version']
* 38 cut -> execve /usr/bin/cut ['cut', '-d', '.', '-f', '1-2']
* 40 cat -> execve /usr/bin/cat ['cat', '/sys/module/mlx5_core/version']
Network Usage:
eth0
start : rx=2.2KiB tx=0.0B
cuda_initialized : rx=2.2KiB tx=0.0B
end : rx=2.2KiB tx=0.0B
end : rx=2.2KiB tx=0.0B
Network Activity:
* [23] cudaCheck -> connect /tmp/nvidia-mps/control
* [41] fuse -> connect /var/run/nscd/socket
* [41] fuse -> connect /tmp/ucx-vfs-root.sock
* [41] python3 -> connect /tmp/nvidia-mps/control
* [1] fuse -> connect /var/run/nscd/socket
* [1] fuse -> connect /tmp/ucx-vfs-root.sock
* [1] python3 -> connect /tmp/nvidia-mps/control
File Accesses:
* /app/Megatron-LM
* /app/workspace
* /app/workspace/__pycache__/dyana.cpython-310.pyc.135013039758048
* /app/workspace/dyana.py
* /app/workspace/entrypoint.sh
* /app/workspace/main.py
* /usr/bin/cat
* /usr/bin/cut
* /usr/bin/grep
* /usr/bin/hostname
* /usr/bin/nvidia-smi
* /usr/bin/python3
* /usr/bin/rm
* /usr/bin/sed
* /usr/bin/timeout
* /usr/bin/touch
* /usr/local/bin/cudaCheck
* /usr/local/cuda/compat/.570.86.10.76bc0a63b927.checked
* /usr/local/cuda/compat/lib.real/libcuda.so.1
* /usr/local/cuda/lib64/libcublas.so.12
* /usr/local/cuda/lib64/libcublasLt.so.12
* /usr/local/cuda/lib64/libcudart.so.12
* /usr/local/cuda/lib64/libcufft.so.11
* /usr/local/cuda/lib64/libcupti.so.12
* /usr/local/cuda/lib64/libcurand.so.10
* /usr/local/cuda/lib64/libcusparse.so.12
* /usr/local/cuda/lib64/libnvJitLink.so.12
* /usr/local/cuda/lib64/libnvToolsExt.so.1
* 3720 accesses to /usr/local/lib/*
* 1591 accesses to /usr/lib/*
* 36 accesses to /lib/*
* 90 accesses to /dev/*
* 146 accesses to /proc/*
* 58 accesses to /sys/*
* 114 accesses to /etc/*
* 2 accesses to /tmp/*
Security Events:
* Dynamic code loading detected (defense-evasion, moderate severity) mind taking another peek when you have a sec? tyia my dude! 🤜 |
@GangGreenTemperTatum good progress!!!! i think there's still something not working as expected tho, i think it should be way more than a +76.0MiB on GPU? Or maybe DMC is really that good? XD |
thank you!!! 🤍 haha i know right!? XD i'll keep tweaking/testing to see if this threshold changes from future code changes and send another push once feeling confident |
whats interesting is that when i run the trace, i run Device 0 [NVIDIA A100 80GB PCIe] PCIe GEN 4@16x RX: 300.0 KiB/s TX: 300.0 KiB/s Device 1 [NVIDIA A100 80GB PCIe] PCIe GEN 4@16x RX: 300.0 KiB/s TX: 350.0 KiB/s
GPU 210MHz MEM 1512MHz TEMP 31°C FAN N/A% POW 44 / 300 W GPU 210MHz MEM 1512MHz TEMP 32°C FAN N/A% POW 42 / 300 W
GPU[ 0%] MEM[ 0.752Gi/80.000Gi] GPU[ 0%] MEM[ 0.752Gi/80.000Gi] right at the point of the trace execution (from GPU Usage:
NVIDIA A100 80GB PCIe | 79.3GiB
* start : 422.0MiB
* cuda_initialized : 498.0MiB 🔺 76.0MiB
* end : 498.0MiB
* end : 498.0MiB
NVIDIA A100 80GB PCIe | 79.3GiB
* start : 422.0MiB
* cuda_initialized : 422.0MiB
* end : 422.0MiB
* end : 422.0MiB maybe i should get some more checkpoints from the DMC page and give them a spin/compare |
@GangGreenTemperTatum mmm ... what if you remove all those env vars? I have the feeling they are messing with how we profile GPU memory consumption |
great call! latest tests: (same tokenizer) (dyana-py3.10) ads@planetexpress:~/git/dyana$ docker system prune -af && dyana trace --loader megatron --allow-gpus --allow-network --timeout 120 --model /home/ads/model_optim_rn-Llama-2-7B-DMC-4x.pt --tokenizer /home/ads/llama-2-7b-tokenizer.model --size 7B --verbose
RAM Usage:
* start : 382.0MiB
* cuda_initialized : 487.1MiB 🔺 105.2MiB
* end : 487.1MiB
* end : 487.5MiB 🔺 384.0KiB
GPU Usage:
NVIDIA A100 80GB PCIe | 79.3GiB
* start : 422.0MiB
* cuda_initialized : 498.0MiB 🔺 76.0MiB
* end : 498.0MiB
* end : 498.0MiB
NVIDIA A100 80GB PCIe | 79.3GiB
* start : 422.0MiB
* cuda_initialized : 422.0MiB
* end : 422.0MiB
* end : 422.0MiB i removed the env vars and got the same results: RAM Usage:
* start : 380.4MiB
* cuda_initialized : 578.9MiB 🔺 198.6MiB
* end : 578.9MiB
* end : 578.9MiB
GPU Usage:
NVIDIA A100 80GB PCIe | 79.3GiB
* start : 423.0MiB
* cuda_initialized : 499.0MiB 🔺 76.0MiB
* end : 499.0MiB
* end : 499.0MiB
NVIDIA A100 80GB PCIe | 79.3GiB
* start : 423.0MiB
* cuda_initialized : 423.0MiB
* end : 423.0MiB
* end : 423.0MiB it takes a good few minutes but i plan to test the rest of the DMC models and will record it all in here :) (dyana-py3.10) ads@planetexpress:~/git/dyana$ docker system prune -af && dyana trace --loader megatron --allow-gpus --allow-network --timeout 120 --model /home/ads/model_optim_rn-Llama-2-7B-DMC-8x.pt --tokenizer /home/ads/llama-2-7b-tokenizer.model --size 7B --verbose
(dyana-py3.10) ads@planetexpress:~/git/dyana$ docker system prune -af && dyana trace --loader megatron --allow-gpus --allow-network --timeout 120 --model /home/ads/model_optim_rn-Llama-2-13B-DMC-4x.pt --tokenizer /home/ads/llama-2-7b-tokenizer.model --size 7B --verbose
(dyana-py3.10) ads@planetexpress:~/git/dyana$ docker system prune -af && dyana trace --loader megatron --allow-gpus --allow-network --timeout 120 --model /home/ads/model_optim_rn-Llama-2-13B-DMC-8x.pt --tokenizer /home/ads/llama-2-7b-tokenizer.model --size 7B --verbose
|
@GangGreenTemperTatum does what you see in dyana match what you see in nvtop? |
yep! it's just the smallest blip in |
damn, this DMC is pretty cool :D |
i even tried to grab a video on my mac of nvtop/dyana and it's so quick it's almost impossible to spot in eyesight XD |
is the model generating coherent text? |
great call! afaik i'm using the correct tokenizer from their example 🤔 |
latest commit (some points as discussed)
docker system prune -af && dyana trace --loader megatron --allow-gpus --allow-network --runtime nvidia --model /path/to/model.pt --size 7B --verbose
# OR
docker system prune -af && dyana trace --loader megatron --allow-gpus --allow-network --runtime nvidia --model /path/to/model.pt --tokenizer /path/to/llama-2-7b-tokenizer.model --size 7B --verbose
│ 659 │ def model_validate_strings( │
│ │
│ ╭─────────────────────────── locals ────────────────────────────╮ │
│ │ context = None │ │
│ │ json_data = 't recent call last):\n File │ │
│ │ "/app/workspace/main.py", line 12, in <module>\n │ │
│ │ f'+99 │ │
│ │ strict = None │ │
│ ╰───────────────────────────────────────────────────────────────╯ │
╰───────────────────────────────────────────────────────────────────╯
ValidationError: 1 validation error for Run
Invalid JSON: expected ident at line 1 column 2 [type=json_invalid,
input_value='t recent call last):\n ...ed \'megatron.model\'\n',
input_type=str]
For further information visit
https://errors.pydantic.dev/2.10/v/json_invalid expandStep 1/20 : FROM nvcr.io/nvidia/pytorch:24.04-py3
---> 3f0b23af1f4f
Step 2/20 : WORKDIR /app
---> Running in f468ce710b95
---> Removed intermediate container f468ce710b95
---> beff3d97d726
Step 3/20 : RUN apt-get update && apt-get install -y
--no-install-recommends git ca-certificates
build-essential && rm -rf /var/lib/apt/lists/*
---> Running in 6712f070f80c
Get:1 http://security.ubuntu.com/ubuntu jammy-security InRelease [129
kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy InRelease [270 kB]
Get:3 http://security.ubuntu.com/ubuntu jammy-security/main amd64
Packages [2606 kB]
Get:4 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128
kB]
Get:5 http://security.ubuntu.com/ubuntu jammy-security/multiverse
amd64 Packages [45.2 kB]
Get:6 http://security.ubuntu.com/ubuntu jammy-security/universe amd64
Packages [1230 kB]
Get:7 http://security.ubuntu.com/ubuntu jammy-security/restricted
amd64 Packages [3606 kB]
Get:8 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127
kB]
Get:9 http://archive.ubuntu.com/ubuntu jammy/universe amd64 Packages
[17.5 MB]
Get:10 http://archive.ubuntu.com/ubuntu jammy/multiverse amd64
Packages [266 kB]
Get:11 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages
[1792 kB]
Get:12 http://archive.ubuntu.com/ubuntu jammy/restricted amd64
Packages [164 kB]
Get:13 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64
Packages [2907 kB]
Get:14 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64
Packages [1523 kB]
Get:15 http://archive.ubuntu.com/ubuntu jammy-updates/restricted
amd64 Packages [3742 kB]
Get:16 http://archive.ubuntu.com/ubuntu jammy-updates/multiverse
amd64 Packages [53.3 kB]
Get:17 http://archive.ubuntu.com/ubuntu jammy-backports/main amd64
Packages [81.4 kB]
Get:18 http://archive.ubuntu.com/ubuntu jammy-backports/universe
amd64 Packages [35.2 kB]
Fetched 36.2 MB in 3s (11.7 MB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
build-essential is already the newest version (12.9ubuntu3).
Suggested packages:
gettext-base git-daemon-run | git-daemon-sysvinit git-doc git-email
git-gui
gitk gitweb git-cvs git-mediawiki git-svn
The following packages will be upgraded:
ca-certificates git
2 upgraded, 0 newly installed, 0 to remove and 68 not upgraded.
Need to get 3328 kB of archives.
After this operation, 29.7 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64
ca-certificates all 20240203~22.04.1 [162 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 git
amd64 1:2.34.1-1ubuntu1.12 [3165 kB]
[91mdebconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
[0m
[91mdebconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
[0m
[91mdpkg-preconfigure: unable to re-open stdin:
[0m
Fetched 3328 kB in 1s (3923 kB/s)
(Reading database ...
(Reading database ... 5%(Reading database ... 10%(Reading database
... 15%(Reading database ... 20%(Reading database ... 25%(Reading
database ... 30%(Reading database ... 35%(Reading database ...
40%(Reading database ... 45%(Reading database ... 50%(Reading
database ... 55%(Reading database ... 60%
(Reading database ... 65%
(Reading database ... 70%
(Reading database ... 75%
(Reading database ... 80%
(Reading database ... 85%
(Reading database ... 90%
(Reading database ... 95%
(Reading database ... 100%(Reading database ... 23388 files and
directories currently installed.)
Preparing to unpack .../ca-certificates_20240203~22.04.1_all.deb ...
Unpacking ca-certificates (20240203~22.04.1) over
(20230311ubuntu0.22.04.1) ...
Preparing to unpack .../git_1%3a2.34.1-1ubuntu1.12_amd64.deb ...
Unpacking git (1:2.34.1-1ubuntu1.12) over (1:2.34.1-1ubuntu1.10) ...
Setting up ca-certificates (20240203~22.04.1) ...
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
Updating certificates in /etc/ssl/certs...
rehash: warning: skipping ca-certificates.crt,it does not contain
exactly one certificate or CRL
14 added, 5 removed; done.
Setting up git (1:2.34.1-1ubuntu1.12) ...
Processing triggers for ca-certificates (20240203~22.04.1) ...
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
---> Removed intermediate container 6712f070f80c
---> f09a8af13a3e
Step 4/20 : RUN mkdir -p /dev/shm && mkdir -p
/tmp/pytorch_extensions && mkdir -p /run/shm && chmod -R 777
/dev/shm /tmp/pytorch_extensions /run/shm
---> Running in 39e52cfd1ff1
---> Removed intermediate container 39e52cfd1ff1
---> 6b0998e90482
Step 5/20 : RUN mkdir -p /dev/shm && mkdir -p /run/shm &&
mkdir -p /tmp/pytorch_extensions && mkdir -p
/tmp/.pytorch_jit_cache && mkdir -p /tmp/transformers &&
chmod -R 777 /dev/shm /run/shm /tmp/pytorch_extensions
/tmp/.pytorch_jit_cache /tmp/transformers
---> Running in 1281351b3bd8
---> Removed intermediate container 1281351b3bd8
---> cf25ed3a32c4
Step 6/20 : RUN python3 -c "import torch; print(f'PyTorch version:
{torch.__version__}')"
---> Running in dac73bbc2a8f
PyTorch version: 2.3.0a0+6ddf5cf85e.nv24.04
---> Removed intermediate container dac73bbc2a8f
---> 9e0fd4d3665c
Step 7/20 : RUN mkdir -p /app/workspace
---> Running in cda0e195ea52
---> Removed intermediate container cda0e195ea52
---> a076f1d5cc90
Step 8/20 : COPY requirements.txt /app/workspace/
---> 4625a317b204
Step 9/20 : COPY *.py /app/workspace/
---> 772859e23adf
Step 10/20 : COPY dyana-requirements*.txt /app/workspace/
---> d5a228fdd290
Step 11/20 : WORKDIR /app/workspace
---> Running in 2c796ca659f2
---> Removed intermediate container 2c796ca659f2
---> 52f422848553
Step 12/20 : RUN pip install --no-cache-dir -r requirements.txt
---> Running in d244b710fb88
Looking in indexes: https://pypi.org/simple,
https://pypi.ngc.nvidia.com, https://download.pytorch.org/whl/cu121
Looking in links:
https://developer.download.nvidia.com/compute/redist
Requirement already satisfied: torch>=2.0.0 in
/usr/local/lib/python3.10/dist-packages (from -r requirements.txt
(line 5)) (2.3.0a0+6ddf5cf85e.nv24.4)
Requirement already satisfied: packaging>=20.0 in
/usr/local/lib/python3.10/dist-packages (from -r requirements.txt
(line 6)) (23.2)
Requirement already satisfied: typing_extensions>=4.0.0 in
/usr/local/lib/python3.10/dist-packages (from -r requirements.txt
(line 7)) (4.10.0)
Collecting flash-attn==2.6.1 (from -r requirements.txt (line 10))
Downloading flash_attn-2.6.1.tar.gz (2.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.6/2.6 MB 16.3 MB/s eta
0:00:00
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Collecting sentencepiece==0.2.0 (from -r requirements.txt (line 11))
Downloading
sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x
86_64.whl.metadata (7.7 kB)
Collecting hydra-core==1.3.2 (from -r requirements.txt (line 12))
Downloading hydra_core-1.3.2-py3-none-any.whl.metadata (5.5 kB)
Collecting hydra_colorlog==1.2.0 (from -r requirements.txt (line 13))
Downloading hydra_colorlog-1.2.0-py3-none-any.whl.metadata (949
bytes)
Collecting nltk (from -r requirements.txt (line 14))
Downloading nltk-3.9.1-py3-none-any.whl.metadata (2.9 kB)
Collecting datasets (from -r requirements.txt (line 15))
Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting transformers>=4.38.0 (from -r requirements.txt (line 16))
Downloading transformers-4.48.2-py3-none-any.whl.metadata (44 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.4/44.4 kB 201.2 MB/s eta
0:00:00
Requirement already satisfied: psutil>=5.6.7 in
/usr/local/lib/python3.10/dist-packages (from -r requirements.txt
(line 19)) (5.9.4)
Requirement already satisfied: einops in
/usr/local/lib/python3.10/dist-packages (from flash-attn==2.6.1->-r
requirements.txt (line 10)) (0.7.0)
Collecting omegaconf<2.4,>=2.2 (from hydra-core==1.3.2->-r
requirements.txt (line 12))
Downloading omegaconf-2.3.0-py3-none-any.whl.metadata (3.9 kB)
Collecting antlr4-python3-runtime==4.9.* (from hydra-core==1.3.2->-r
requirements.txt (line 12))
Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 117.0/117.0 kB 135.0 MB/s eta
0:00:00
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Collecting colorlog (from hydra_colorlog==1.2.0->-r requirements.txt
(line 13))
Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: filelock in
/usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (3.13.3)
Requirement already satisfied: sympy in
/usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (1.12)
Requirement already satisfied: networkx in
/usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (2.6.3)
Requirement already satisfied: jinja2 in
/usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (3.1.3)
Requirement already satisfied: fsspec in
/usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->-r
requirements.txt (line 5)) (2024.2.0)
Requirement already satisfied: click in
/usr/local/lib/python3.10/dist-packages (from nltk->-r
requirements.txt (line 14)) (8.1.7)
Requirement already satisfied: joblib in
/usr/local/lib/python3.10/dist-packages (from nltk->-r
requirements.txt (line 14)) (1.3.2)
Requirement already satisfied: regex>=2021.8.3 in
/usr/local/lib/python3.10/dist-packages (from nltk->-r
requirements.txt (line 14)) (2023.12.25)
Requirement already satisfied: tqdm in
/usr/local/lib/python3.10/dist-packages (from nltk->-r
requirements.txt (line 14)) (4.66.2)
Requirement already satisfied: numpy>=1.17 in
/usr/local/lib/python3.10/dist-packages (from datasets->-r
requirements.txt (line 15)) (1.24.4)
Collecting pyarrow>=15.0.0 (from datasets->-r requirements.txt (line
15))
Downloading
pyarrow-19.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3
kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets->-r requirements.txt
(line 15))
Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: pandas in
/usr/local/lib/python3.10/dist-packages (from datasets->-r
requirements.txt (line 15)) (1.5.3)
Collecting requests>=2.32.2 (from datasets->-r requirements.txt (line
15))
Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting tqdm (from nltk->-r requirements.txt (line 14))
Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.7/57.7 kB 207.7 MB/s eta
0:00:00
Collecting xxhash (from datasets->-r requirements.txt (line 15))
Downloading
xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.w
hl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets->-r requirements.txt
(line 15))
Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Requirement already satisfied: aiohttp in
/usr/local/lib/python3.10/dist-packages (from datasets->-r
requirements.txt (line 15)) (3.9.3)
Collecting huggingface-hub>=0.23.0 (from datasets->-r
requirements.txt (line 15))
Downloading huggingface_hub-0.28.1-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: pyyaml>=5.1 in
/usr/local/lib/python3.10/dist-packages (from datasets->-r
requirements.txt (line 15)) (6.0.1)
Collecting tokenizers<0.22,>=0.21 (from transformers>=4.38.0->-r
requirements.txt (line 16))
Downloading
tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_6
4.whl.metadata (6.7 kB)
Collecting safetensors>=0.4.1 (from transformers>=4.38.0->-r
requirements.txt (line 16))
Downloading
safetensors-0.5.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_6
4.whl.metadata (3.8 kB)
Requirement already satisfied: aiosignal>=1.1.2 in
/usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->-r
requirements.txt (line 15)) (1.3.1)
Requirement already satisfied: attrs>=17.3.0 in
/usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->-r
requirements.txt (line 15)) (23.2.0)
Requirement already satisfied: frozenlist>=1.1.1 in
/usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->-r
requirements.txt (line 15)) (1.4.1)
Requirement already satisfied: multidict<7.0,>=4.5 in
/usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->-r
requirements.txt (line 15)) (6.0.5)
Requirement already satisfied: yarl<2.0,>=1.0 in
/usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->-r
requirements.txt (line 15)) (1.9.4)
Requirement already satisfied: async-timeout<5.0,>=4.0 in
/usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->-r
requirements.txt (line 15)) (4.0.3)
Requirement already satisfied: charset-normalizer<4,>=2 in
/usr/local/lib/python3.10/dist-packages (from
requests>=2.32.2->datasets->-r requirements.txt (line 15)) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in
/usr/local/lib/python3.10/dist-packages (from
requests>=2.32.2->datasets->-r requirements.txt (line 15)) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in
/usr/local/lib/python3.10/dist-packages (from
requests>=2.32.2->datasets->-r requirements.txt (line 15)) (1.26.18)
Requirement already satisfied: certifi>=2017.4.17 in
/usr/local/lib/python3.10/dist-packages (from
requests>=2.32.2->datasets->-r requirements.txt (line 15)) (2024.2.2)
Requirement already satisfied: MarkupSafe>=2.0 in
/usr/local/lib/python3.10/dist-packages (from
jinja2->torch>=2.0.0->-r requirements.txt (line 5)) (2.1.5)
Requirement already satisfied: python-dateutil>=2.8.1 in
/usr/local/lib/python3.10/dist-packages (from pandas->datasets->-r
requirements.txt (line 15)) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in
/usr/local/lib/python3.10/dist-packages (from pandas->datasets->-r
requirements.txt (line 15)) (2024.1)
Requirement already satisfied: mpmath>=0.19 in
/usr/local/lib/python3.10/dist-packages (from sympy->torch>=2.0.0->-r
requirements.txt (line 5)) (1.3.0)
Requirement already satisfied: six>=1.5 in
/usr/local/lib/python3.10/dist-packages (from
python-dateutil>=2.8.1->pandas->datasets->-r requirements.txt (line
15)) (1.16.0)
Downloading
sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x
86_64.whl (1.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 46.2 MB/s eta
0:00:00
Downloading hydra_core-1.3.2-py3-none-any.whl (154 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 154.5/154.5 kB 90.0 MB/s eta
0:00:00
Downloading hydra_colorlog-1.2.0-py3-none-any.whl (3.6 kB)
Downloading nltk-3.9.1-py3-none-any.whl (1.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 48.2 MB/s eta
0:00:00
Downloading datasets-3.2.0-py3-none-any.whl (480 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 480.6/480.6 kB 63.1 MB/s eta
0:00:00
Downloading transformers-4.48.2-py3-none-any.whl (9.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.7/9.7 MB 50.8 MB/s eta
0:00:00
Downloading dill-0.3.8-py3-none-any.whl (116 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 162.0 MB/s eta
0:00:00
Downloading huggingface_hub-0.28.1-py3-none-any.whl (464 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 464.1/464.1 kB 88.2 MB/s eta
0:00:00
Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.8/134.8 kB 136.3 MB/s eta
0:00:00
Downloading omegaconf-2.3.0-py3-none-any.whl (79 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 79.5/79.5 kB 139.6 MB/s eta
0:00:00
Downloading pyarrow-19.0.0-cp310-cp310-manylinux_2_28_x86_64.whl
(42.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.1/42.1 MB 116.8 MB/s eta
0:00:00
Downloading requests-2.32.3-py3-none-any.whl (64 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.9/64.9 kB 121.2 MB/s eta
0:00:00
Downloading
safetensors-0.5.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_6
4.whl (461 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 462.0/462.0 kB 132.3 MB/s eta
0:00:00
Downloading
tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_6
4.whl (3.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.0/3.0 MB 118.6 MB/s eta
0:00:00
Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 kB 122.7 MB/s eta
0:00:00
Downloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Downloading
xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.w
hl (194 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.1/194.1 kB 155.2 MB/s eta
0:00:00
Building wheels for collected packages: flash-attn,
antlr4-python3-runtime
Building wheel for flash-attn (setup.py): started
Building wheel for flash-attn (setup.py): finished with status 'done'
Created wheel for flash-attn:
filename=flash_attn-2.6.1-cp310-cp310-linux_x86_64.whl size=198444860
sha256=8da92ac0324e367e37327e6cdd621b9e0a02e72b6328c7ad8096ffa4a9c9b6
99
Stored in directory:
/tmp/pip-ephem-wheel-cache-75z6clos/wheels/91/6a/38/f0faa036b4ac73a73
247386f1ab1bb4cb4f6e72e6861a779f1
Building wheel for antlr4-python3-runtime (setup.py): started
Building wheel for antlr4-python3-runtime (setup.py): finished with
status 'done'
Created wheel for antlr4-python3-runtime:
filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144554
sha256=9c23707dd19bc41339473a9c1c4ca74c3389f29824f8a53bf013b7ccf6a99c
0e
Stored in directory:
/tmp/pip-ephem-wheel-cache-75z6clos/wheels/12/93/dd/1f6a127edc4565955
6564c5730f6d4e300888f4bca2d4c5a88
Successfully built flash-attn antlr4-python3-runtime
Installing collected packages: sentencepiece, antlr4-python3-runtime,
xxhash, tqdm, safetensors, requests, pyarrow, omegaconf, dill,
colorlog, nltk, multiprocess, hydra-core, huggingface-hub,
tokenizers, hydra_colorlog, flash-attn, transformers, datasets
Attempting uninstall: tqdm
Found existing installation: tqdm 4.66.2
Uninstalling tqdm-4.66.2:
Successfully uninstalled tqdm-4.66.2
Attempting uninstall: requests
Found existing installation: requests 2.31.0
Uninstalling requests-2.31.0:
Successfully uninstalled requests-2.31.0
Attempting uninstall: pyarrow
Found existing installation: pyarrow 14.0.1
Uninstalling pyarrow-14.0.1:
Successfully uninstalled pyarrow-14.0.1
Attempting uninstall: flash-attn
Found existing installation: flash-attn 2.4.2
Uninstalling flash-attn-2.4.2:
Successfully uninstalled flash-attn-2.4.2
[91mERROR: pip's dependency resolver does not currently take into
account all the packages that are installed. This behaviour is the
source of the following dependency conflicts.
torchtext 0.17.0a0 requires torch==2.3.0a0+6ddf5cf, but you have
torch 2.3.0a0+6ddf5cf85e.nv24.4 which is incompatible.
transformer-engine 1.5.0+6a9edc3 requires
flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6, but you have flash-attn
2.6.1 which is incompatible.
[0m
Successfully installed antlr4-python3-runtime-4.9.3 colorlog-6.9.0
datasets-3.2.0 dill-0.3.8 flash-attn-2.6.1 huggingface-hub-0.28.1
hydra-core-1.3.2 hydra_colorlog-1.2.0 multiprocess-0.70.16 nltk-3.9.1
omegaconf-2.3.0 pyarrow-19.0.0 requests-2.32.3 safetensors-0.5.2
sentencepiece-0.2.0 tokenizers-0.21.0 tqdm-4.67.1 transformers-4.48.2
xxhash-3.5.0
[91mWARNING: Running pip as the 'root' user can result in broken
permissions and conflicting behaviour with the system package
manager. It is recommended to use a virtual environment instead:
https://pip.pypa.io/warnings/venv
[0m
[91m
A new release of pip is available: 24.0 -> 25.0
To update, run: python -m pip install --upgrade pip
[0m
---> Removed intermediate container d244b710fb88
---> b6caeb19e504
Step 13/20 : RUN git clone --depth 1 --branch dmc
https://github.com/NVIDIA/Megatron-LM.git /app/Megatron-LM && cd
/app/Megatron-LM && pip install -e .
---> Running in 655f14c9daaa
[91mCloning into '/app/Megatron-LM'...
[0m
Looking in indexes: https://pypi.org/simple,
https://pypi.ngc.nvidia.com
Obtaining file:///app/Megatron-LM
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Checking if build backend supports build_editable: started
Checking if build backend supports build_editable: finished with
status 'done'
Getting requirements to build editable: started
Getting requirements to build editable: finished with status 'done'
Preparing editable metadata (pyproject.toml): started
Preparing editable metadata (pyproject.toml): finished with status
'done'
Requirement already satisfied: torch in
/usr/local/lib/python3.10/dist-packages (from
megatron-core==0.10.0rc0) (2.3.0a0+6ddf5cf85e.nv24.4)
Requirement already satisfied: packaging in
/usr/local/lib/python3.10/dist-packages (from
megatron-core==0.10.0rc0) (23.2)
Requirement already satisfied: filelock in
/usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (3.13.3)
Requirement already satisfied: typing-extensions>=4.8.0 in
/usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (4.10.0)
Requirement already satisfied: sympy in
/usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (1.12)
Requirement already satisfied: networkx in
/usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (2.6.3)
Requirement already satisfied: jinja2 in
/usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (3.1.3)
Requirement already satisfied: fsspec in
/usr/local/lib/python3.10/dist-packages (from
torch->megatron-core==0.10.0rc0) (2024.2.0)
Requirement already satisfied: MarkupSafe>=2.0 in
/usr/local/lib/python3.10/dist-packages (from
jinja2->torch->megatron-core==0.10.0rc0) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in
/usr/local/lib/python3.10/dist-packages (from
sympy->torch->megatron-core==0.10.0rc0) (1.3.0)
Building wheels for collected packages: megatron-core
Building editable for megatron-core (pyproject.toml): started
Building editable for megatron-core (pyproject.toml): finished with
status 'done'
Created wheel for megatron-core:
filename=megatron_core-0.10.0rc0-0.editable-cp310-cp310-linux_x86_64.
whl size=16959
sha256=2cbad7ccbb67f8168bacec77cb017f982b24325875a965e1084e15fa5870ba
12
Stored in directory:
/tmp/pip-ephem-wheel-cache-1mwzpgjm/wheels/ac/13/e6/1ba2b5b3bea71b4ae
468d84030e90cc700c9d586023e483a1e
Successfully built megatron-core
Installing collected packages: megatron-core
Successfully installed megatron-core-0.10.0rc0
[91mWARNING: Running pip as the 'root' user can result in broken
permissions and conflicting behaviour with the system package
manager. It is recommended to use a virtual environment instead:
https://pip.pypa.io/warnings/venv
[0m
[91m
A new release of pip is available: 24.0 -> 25.0
To update, run: python -m pip install --upgrade pip
[0m
---> Removed intermediate container 655f14c9daaa
---> 841d8741637d
Step 14/20 : ENV PYTHONPATH=/app/Megatron-LM:$PYTHONPATH
---> Running in 176506718e1a
---> Removed intermediate container 176506718e1a
---> 92770022670c
Step 15/20 : RUN mkdir -p /dev/shm && mkdir -p
/tmp/pytorch_extensions && chmod -R 777 /dev/shm
/tmp/pytorch_extensions
---> Running in f76911bac930
---> Removed intermediate container f76911bac930
---> 367e108d8748
Step 16/20 : RUN printf '#!/bin/bash\npython3 -c "import torch;
assert torch.cuda.is_available(), \"CUDA is not available\";
device=torch.cuda.get_device_name(); print(f\"CUDA OK: {device}\")"
&& export PYTHONPATH=/app/workspace:/app/Megatron-LM:$PYTHONPATH &&
exec python3 -W ignore main.py "$@"' > /app/workspace/entrypoint.sh
&& chmod +x /app/workspace/entrypoint.sh
---> Running in 6bd001391d73
---> Removed intermediate container 6bd001391d73
---> 62d843f609e5
Step 17/20 : RUN ls -la /app/workspace && ls -la
/app/workspace/entrypoint.sh && test -x
/app/workspace/entrypoint.sh
---> Running in 579931b3315b
total 52
drwxr-xr-x 1 root root 4096 Feb 5 20:07 .
drwxr-xr-x 1 root root 4096 Feb 5 20:07 ..
-rw-rw-r-- 1 root root 42 Feb 5 20:02 dyana-requirements-gpu.txt
-rw-rw-r-- 1 root root 29 Feb 5 20:02 dyana-requirements.txt
-rw-rw-r-- 1 root root 6886 Feb 5 20:02 dyana.py
-rwxr-xr-x 1 root root 266 Feb 5 20:07 entrypoint.sh
-rw-rw-r-- 1 root root 15389 Feb 5 19:54 main.py
-rw-rw-r-- 1 root root 386 Feb 5 18:43 requirements.txt
-rw-rw-r-- 1 root root 594 Feb 4 15:24 verify.py
-rwxr-xr-x 1 root root 266 Feb 5 20:07 /app/workspace/entrypoint.sh
---> Removed intermediate container 579931b3315b
---> dff8fa7afd2c
Step 18/20 : RUN chown -R root:root /app && chmod -R 755 /app &&
chmod +x /app/workspace/entrypoint.sh
---> Running in 55d7f4f7639b
---> Removed intermediate container 55d7f4f7639b
---> d803d7d54231
Step 19/20 : SHELL ["/bin/bash", "-c"]
---> Running in a3c3f0ab9b91
---> Removed intermediate container a3c3f0ab9b91
---> 84abf848fdee
Step 20/20 : ENTRYPOINT ["/bin/bash", "-c", "exec
/app/workspace/entrypoint.sh \"$@\""]
---> Running in c5fee62db347
---> Removed intermediate container c5fee62db347
---> 84ebdb4bf420
Successfully built 84ebdb4bf420
Successfully tagged dyana-megatron-loader:latest
👁️🗨️ tracer: initializing ...
👁️🗨️ tracer: started ...
🍿 loader: warning: allowing bridged network access to the container
🍿 loader: executing with arguments ['--model',
'/model_optim_rn-llama-2-7b-dmc-4x.pt', '--size', '7B', '--input',
'This is an example prompt.'] ...
👁️🗨️ tracer: stopping ...
🗃 saving 10293 events to trace.json
Platform : Linux-6.8.0-52-generic-x86_64-with-glibc2.35
Loader : megatron
Arguments : --model /model_optim_rn-llama-2-7b-dmc-4x.pt --size
7B --input This is an example prompt.
Volumes : /model_optim_rn-llama-2-7b-dmc-4x.pt
(/home/ads/model_optim_rn-Llama-2-7B-DMC-4x.pt)
Started at : 2025-02-05T13:07:46.476947
Ended at : 2025-02-05T13:07:55.272080
Total Events : 10293
RAM Usage:
* start : 381.5MiB
* cuda_initialized : 581.8MiB 🔺 200.3MiB
* end : 15.2GiB 🔺 14.7GiB
* end : 754.5MiB
GPU Usage:
NVIDIA A100 80GB PCIe | 79.3GiB
* start : 673.4MiB
* cuda_initialized : 749.4MiB 🔺 76.0MiB
* end : 749.4MiB
* end : 749.4MiB
NVIDIA A100 80GB PCIe | 79.3GiB
* start : 436.5MiB
* cuda_initialized : 436.5MiB
* end : 436.5MiB
* end : 436.5MiB
Disk Usage:
* start : 1.3TiB
* cuda_initialized : 1.3TiB 🔺 4.0KiB
* end : 1.3TiB 🔺 22.0MiB
* end : 1.3TiB 🔺 52.0KiB
Process Executions:
* 1 python3 -> execve /usr/bin/python3 ['python3', '-W', 'ignore',
'main.py', '/model_optim_rn-llama-2-7b-dmc-4x.pt', '--size', '7B',
'--input', 'This is an example prompt.']
* 11 hostname -> execve /usr/bin/hostname ['hostname']
* 20 touch -> execve /usr/bin/touch ['touch',
'/usr/local/cuda/compat/.570.86.10.b8ed4c508d77.checked']
* 21 rm -> execve /usr/bin/rm ['rm', '-f',
'/usr/local/cuda/compat/lib']
* 31 hostname -> execve /usr/bin/hostname ['hostname']
* 42 python3 -> execve /usr/bin/python3 ['python3', '-c', 'import
torch; assert torch.cuda.is_available(), CUDA', 'is', 'not',
'available; device=torch.cuda.get_device_name(); print(fCUDA', 'OK:',
'{device})']
* 143 python3 -> execve /usr/bin/python3 ['/usr/bin/python3', '-m',
'pip', 'show', 'transformer_engine']
* 10 sed -> execve /usr/bin/sed ['sed', 's/^$/unknown/']
* 9 sed -> execve /usr/bin/sed ['sed', '-n', 's/^NVRM.*Kernel
Module\\( for *\\| \\) *\\([^() ]*\\).*$/\\2/p',
'/proc/driver/nvidia/version']
* 13 nvidia-smi -> execve /usr/bin/nvidia-smi ['nvidia-smi', '-q',
'-d', 'COMPUTE']
* 15 sed -> execve /usr/bin/sed ['sed', 's/^.*: //']
* 14 grep -> execve /usr/bin/grep ['grep', '^CUDA Version']
* 19 cut -> execve /usr/bin/cut ['cut', '-d', '.', '-f', '1-2']
* 23 timeout -> execve /usr/bin/timeout ['timeout', '-s', 'KILL',
'35', '/usr/local/bin/cudaCheck']
* 24 cudaCheck -> execve /usr/local/bin/cudaCheck
['/usr/local/bin/cudaCheck']
* 27 cat -> execve /usr/bin/cat ['cat',
'/sys/module/mlx5_core/version']
* 30 sed -> execve /usr/bin/sed ['sed', 's/^$/unknown/']
* 29 sed -> execve /usr/bin/sed ['sed', '-n', 's/^NVRM.*Kernel
Module\\( for *\\| \\) *\\([^() ]*\\).*$/\\2/p',
'/proc/driver/nvidia/version']
* 33 nvidia-smi -> execve /usr/bin/nvidia-smi ['nvidia-smi', '-q',
'-d', 'COMPUTE']
* 34 grep -> execve /usr/bin/grep ['grep', '^CUDA Version']
* 35 sed -> execve /usr/bin/sed ['sed', 's/^.*: //']
* 39 cut -> execve /usr/bin/cut ['cut', '-d', '.', '-f', '1-2']
* 41 cat -> execve /usr/bin/cat ['cat',
'/sys/module/mlx5_core/version']
Network Usage:
eth0
start : rx=2.0KiB tx=0.0B
cuda_initialized : rx=2.0KiB tx=0.0B
end : rx=2.6KiB 🔺 591.0B tx=0.0B
end : rx=2.6KiB tx=0.0B
Network Activity:
* [24] cudaCheck -> connect /tmp/nvidia-mps/control
* [42] fuse -> connect /var/run/nscd/socket
* [42] fuse -> connect /tmp/ucx-vfs-root.sock
* [42] python3 -> connect /tmp/nvidia-mps/control
* [1] fuse -> connect /var/run/nscd/socket
* [1] fuse -> connect /tmp/ucx-vfs-root.sock
* [1] python3 -> connect /tmp/nvidia-mps/control
File Accesses:
* /app/Megatron-LM
* /app/Megatron-LM/megatron
* /app/Megatron-LM/megatron/core
* /app/Megatron-LM/megatron/core/__init__.py
*
/app/Megatron-LM/megatron/core/__pycache__/__init__.cpython-310.pyc.1
30557660599856
*
/app/Megatron-LM/megatron/core/__pycache__/parallel_state.cpython-310
.pyc.130557660601136
*
/app/Megatron-LM/megatron/core/__pycache__/utils.cpython-310.pyc.1305
57660601008
* /app/Megatron-LM/megatron/core/dist_checkpointing
* /app/Megatron-LM/megatron/core/dist_checkpointing/__init__.py
*
/app/Megatron-LM/megatron/core/dist_checkpointing/__pycache__/__init_
_.cpython-310.pyc.130557660819456
*
/app/Megatron-LM/megatron/core/dist_checkpointing/__pycache__/core.cp
ython-310.pyc.130557660819888
*
/app/Megatron-LM/megatron/core/dist_checkpointing/__pycache__/dict_ut
ils.cpython-310.pyc.130557660822048
*
/app/Megatron-LM/megatron/core/dist_checkpointing/__pycache__/exchang
e_utils.cpython-310.pyc.130557616005616
*
/app/Megatron-LM/megatron/core/dist_checkpointing/__pycache__/mapping
.cpython-310.pyc.130557660821040
*
/app/Megatron-LM/megatron/core/dist_checkpointing/__pycache__/seriali
zation.cpython-310.pyc.130557616004752
*
/app/Megatron-LM/megatron/core/dist_checkpointing/__pycache__/state_d
ict_transformation.cpython-310.pyc.130557660537328
*
/app/Megatron-LM/megatron/core/dist_checkpointing/__pycache__/utils.c
python-310.pyc.130557616006048
* /app/Megatron-LM/megatron/core/dist_checkpointing/core.py
* /app/Megatron-LM/megatron/core/dist_checkpointing/dict_utils.py
*
/app/Megatron-LM/megatron/core/dist_checkpointing/exchange_utils.py
* /app/Megatron-LM/megatron/core/dist_checkpointing/mapping.py
*
/app/Megatron-LM/megatron/core/dist_checkpointing/serialization.py
*
/app/Megatron-LM/megatron/core/dist_checkpointing/state_dict_transfor
mation.py
* /app/Megatron-LM/megatron/core/dist_checkpointing/utils.py
* /app/Megatron-LM/megatron/core/parallel_state.py
* /app/Megatron-LM/megatron/core/tensor_parallel
* /app/Megatron-LM/megatron/core/tensor_parallel/__init__.py
*
/app/Megatron-LM/megatron/core/tensor_parallel/__pycache__/__init__.c
python-310.pyc.130557660816432
*
/app/Megatron-LM/megatron/core/tensor_parallel/__pycache__/cross_entr
opy.cpython-310.pyc.130557660816720
* /app/Megatron-LM/megatron/core/tensor_parallel/cross_entropy.py
* /app/Megatron-LM/megatron/core/utils.py
* /app/workspace
* /app/workspace/__pycache__/dyana.cpython-310.pyc.130558276780192
* /app/workspace/dyana.py
* /app/workspace/entrypoint.sh
* /app/workspace/main.py
* /proc
* /root/.cache/huggingface/hub/version.txt
* /usr/bin/cat
* /usr/bin/cut
* /usr/bin/grep
* /usr/bin/hostname
* /usr/bin/nvidia-smi
* /usr/bin/python3
* /usr/bin/rm
* /usr/bin/sed
* /usr/bin/timeout
* /usr/bin/touch
* /usr/local/bin/cudaCheck
* /usr/local/cuda/compat/.570.86.10.b8ed4c508d77.checked
* /usr/local/cuda/compat/lib.real/libcuda.so.1
* /usr/local/cuda/lib64/libcublas.so.12
* /usr/local/cuda/lib64/libcublasLt.so.12
* /usr/local/cuda/lib64/libcufft.so.11
* /usr/local/cuda/lib64/libcupti.so.12
* /usr/local/cuda/lib64/libcurand.so.10
* /usr/local/cuda/lib64/libcusparse.so.12
* /usr/local/cuda/lib64/libnvJitLink.so.12
* 7688 accesses to /usr/local/lib/*
* 1717 accesses to /usr/lib/*
* 44 accesses to /lib/*
* 160 accesses to /dev/*
* 163 accesses to /proc/*
* 58 accesses to /sys/*
* 116 accesses to /etc/*
* 2 accesses to /tmp/*
Security Events:
* Dynamic code loading detected (defense-evasion, moderate
severity)
Top Level Imports:
* sympy.*: 408
* torch.*: 367
* transformers.*: 314
* huggingface_hub.*: 87
* requests.*: 74
* setuptools.*: 53
* mpmath.*: 51
* triton.*: 45
* megatron.*: 43
* urllib3.*: 32
* transformer_engine.*: 32
* onnx.*: 32
* pkg_resources.*: 31
* google.*: 30
* distutils.*: 27
* jinja2.*: 21
* yaml.*: 18
* tokenizers.*: 14
* charset_normalizer.*: 10
* PIL.*: 10
* filelock.*: 8
* xml.*: 8
* wcwidth.*: 6
* idna.*: 5
* tqdm.*: 5
* cffi.*: 5
* importlib.*: 4
* functorch.*: 4
* encodings.*: 3
* packaging.*: 3
* pyexpat.*: 3
* defusedxml.*: 3
* sentencepiece.*: 3
* safetensors.*: 3
* multiprocessing.*: 3
* concurrent.*: 2
* http.*: 2
* certifi.*: 2
* markupsafe.*: 2
* pydantic.*: 2
* email.*: 2
* flash_attn.*: 2
* html.*: 2
* tabulate.*: 2
* mimetypes
* _multibytecodec
* stringprep
* configparser
* _elementtree
* swig_runtime_data4
* filecmp
* transformer_engine_extensions
* colorsys
* timeit
* fractions
* termios
* getpass
* _lsprof
* profile
* cProfile
* pstats
* pkgutil
* shlex
* sysconfig
* getopt
* _distutils_system_mod
* _distutils_hack
* plistlib
* flash_attn_2_cuda
* fused_weight_gradient_mlp_cuda |
dyana/loaders/megatron/main.py
Outdated
os.environ["TORCH_USE_CUDA_DSA"] = "0" | ||
os.environ["PYTORCH_JIT"] = "0" # Disable JIT at env level | ||
os.environ["TORCH_USE_RTLD_GLOBAL"] = "1" | ||
os.environ["TORCH_INDUCTOR_DISABLE_CUDA_GRAPH"] = "1" # Disable CUDA graphs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vaffancuda
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(gesticulates in italian)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL XD <3
dyana/loaders/megatron/main.py
Outdated
# PyTorch before other imports | ||
print("=== Configuring PyTorch ===", file=sys.stderr) | ||
# Disable JIT compilation using available methods | ||
if hasattr(torch._C, "_jit_set_profiling_mode"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dyana/loaders/megatron/main.py
Outdated
profiler = Profiler(gpu=True) | ||
|
||
if not torch.cuda.is_available(): | ||
raise RuntimeError("CUDA is not available but required") | ||
|
||
# Force CUDA initialization | ||
torch.cuda.init() # type: ignore[no-untyped-call] | ||
torch.cuda.init() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i feel we're sending CUDA conflicting messages ...
- description: "Load a Megatron-DMC model with tokenizer:" | ||
command: dyana trace --loader megatron --model /path/to/model --tokenizer /path/to/tokenizer.model --size 7B | ||
- description: "Load a Megatron-DMC model:" | ||
command: dyana trace --loader megatron --model /path/to/model.pt --size 7B |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perfect! 👨🏻🍳
dyana/loaders/megatron/main.py
Outdated
@@ -1,4 +1,4 @@ | |||
# ruff: noqa: I001, E402, F401, F821 | |||
# ruff: noqa: I001, F401, E402, B904, F821 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XD i honestly felt the same, these libraries aren't friendly with ruff 💀
torch.cuda.set_device(0) | ||
|
||
|
||
def find_tokenizer(model_path: Path) -> Path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! beautiful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taking after a true leader/ninja! ;)
intro
this is a custom loader intended for loading the NVIDIA DMC HF collection with dyana, ie:
https://huggingface.co/nvidia/Llama-2-7B-DMC-4x#inference
for review
i named the loader "megatron", based off the repo from NVIDIA (i couldn't not <3) - would it make more sense for it to be named "dmc-nvidia"/"dmc" or something perhaps? open to changing this
testing
testing is being done specifically with:
nvcr.io/nvidia/pytorch:24.04-py3
&ubuntu 22.04.5 LTS (Jammy Jellyfish)
testing validation:
docker system prune -af && dyana trace --loader megatron --allow-network --timeout 120 --model /home/ads/model_optim_rng.pt --tokenizer /home/ads/llama-2-7b-tokenizer.model --size 7B --verbose
Click to expand