[Low-bit optim] Support for `dcp.save()` and `dcp.load()` #1217

gau-nernst · 2024-11-03T07:01:46Z

To support dcp.save(), aten.detach and aten.is_pinned are required
To support dcp.load()
- When world size does not change, no addition ops are needed
- When world size changes, aten.slice is required

Thus this PR adds implementations for the above 3 ops for all low-bit optim state subclasses in torchao, as well as appropriate tests. Also did some minor housekeeping (e.g. format code, remove torch>=2.3 guard since we only test against torch>=2.3 now...).

Note: Low-bit optims are still not compatible with dcp.state_dict.get_optimizer_state_dict() due to pytorch/pytorch#139575

pytorch-bot · 2024-11-03T07:01:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1217

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 782a6b1 with merge base 0e854ec ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run Regression Tests / test (CUDA 2.5.1, linux.g5.12xlarge.nvidia.gpu, torch==2.5.1 --index-url https://download.pytorch... / linux-job (gh) (trunk failure)
test/prototype/test_sparse_api.py::TestQuantBlockSparseWeight::test_sparse_compile_True

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchao/prototype/low_bit_optim/subclass_4bit.py

test/prototype/test_low_bit_optim.py

vkuzo

nice! for now stamping to unblock (assuming we fix CI). Lmk if you'd like a proper review - happy to take more time, would just need to catch up on low bit optimizers first.

gau-nernst · 2024-11-08T01:58:58Z

@vkuzo Sure, we can wait until CI is fixed (How do I know when CI is fixed? Seems like no tracking issue atm). No urgent, unless @nighting0le01 needs this patch merged to main soon?

msaroufim · 2024-11-08T02:10:29Z

You can ignore the cpu nightly failure. That hasn't been root caused yet but is likely a runner specific issue

gau-nernst · 2024-11-08T02:14:57Z

New CI errors (seems like runner issue too?) https://github.com/pytorch/ao/actions/runs/11734156452/job/32689712737?pr=1217 - Will try rerun this later to see if it still persists.

Also a lot of build wheels CI are failing e.g.

msaroufim · 2024-11-08T16:21:11Z

glibc error was fixed
the conda arm error is new but should be fixed, @malfet is this related? pytorch/test-infra@709824e

malfet · 2024-11-08T21:20:32Z

glibc error was fixed the conda arm error is new but should be fixed, @malfet is this related? pytorch/test-infra@709824e

@msaroufim yes, aarch64 build failures should have been fixed by that commit

* support dcp.save * add test for dcp.load() * fix test * typo * implement aten.slice * skip test * fix checks * run ruff * fix formatting * remove add safe globals in test * sort some imports --------- Co-authored-by: Mark Saroufim <[email protected]>

nighting0le01 · 2024-12-05T23:19:24Z

torchao/prototype/low_bit_optim/subclass_fp8.py

+
+    # input validation
+    if dim != 0:
+        raise ValueError("Only support aten.slice along the first dim")


@gau-nernst this raises Valueerror when swithching from TP=1,DP=8 to DP=1,TP=8. why is it required.

Because of block-wise quantization, slicing in any dim > 0 is messy. It's doable, but just messy, and it does not always work (i.e. if you are slicing in the middle of a quantization block -> not possible).
I don't use TP before so I don't know how it does the sharding. Can you try print out the x.shape, dim, start, end? And possibly open a new issue so we can discuss there.

gau-nernst added 2 commits November 3, 2024 14:13

support dcp.save

45d67fd

add test for dcp.load()

03cc673

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 3, 2024

gau-nernst mentioned this pull request Nov 3, 2024

Cannot run FSDP2 with low bit optim from AO #1189

Closed

nighting0le01 reviewed Nov 3, 2024

View reviewed changes

torchao/prototype/low_bit_optim/subclass_4bit.py Show resolved Hide resolved

gau-nernst added 3 commits November 3, 2024 18:19

fix test

1451c92

typo

aa083f5

implement aten.slice

b09c255

gau-nernst commented Nov 4, 2024

View reviewed changes

test/prototype/test_low_bit_optim.py Outdated Show resolved Hide resolved

gau-nernst added 2 commits November 4, 2024 03:55

skip test

cb4511d

fix checks

771e26e

gau-nernst marked this pull request as ready for review November 4, 2024 05:35

gau-nernst requested review from msaroufim and awgu November 4, 2024 05:50

msaroufim requested a review from vkuzo November 5, 2024 03:58

gau-nernst mentioned this pull request Nov 6, 2024

Fix for weights-only load #1228

Merged

gau-nernst added 4 commits November 6, 2024 16:15

Merge branch 'main' into optim_fsdp_save_load

fccc204

run ruff

f2b2034

fix formatting

f5cbb20

remove add safe globals in test

c00d10d

vkuzo approved these changes Nov 7, 2024

View reviewed changes

gau-nernst added 2 commits November 8, 2024 01:05

Merge branch 'main' into optim_fsdp_save_load

6eff895

sort some imports

8db40eb

Merge branch 'main' into optim_fsdp_save_load

6746174

gau-nernst added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Nov 9, 2024

Merge branch 'main' into optim_fsdp_save_load

782a6b1

gau-nernst merged commit 75f52ae into pytorch:main Nov 9, 2024
17 of 18 checks passed

gau-nernst deleted the optim_fsdp_save_load branch November 9, 2024 07:11

nighting0le01 reviewed Dec 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Low-bit optim] Support for `dcp.save()` and `dcp.load()` #1217

[Low-bit optim] Support for `dcp.save()` and `dcp.load()` #1217

gau-nernst commented Nov 3, 2024 •

edited

Loading

pytorch-bot bot commented Nov 3, 2024 •

edited

Loading

vkuzo left a comment

gau-nernst commented Nov 8, 2024

msaroufim commented Nov 8, 2024

gau-nernst commented Nov 8, 2024

msaroufim commented Nov 8, 2024 •

edited

Loading

malfet commented Nov 8, 2024

nighting0le01 Dec 5, 2024 •

edited

Loading

gau-nernst Dec 6, 2024

[Low-bit optim] Support for dcp.save() and dcp.load() #1217

[Low-bit optim] Support for dcp.save() and dcp.load() #1217

Conversation

gau-nernst commented Nov 3, 2024 • edited Loading

pytorch-bot bot commented Nov 3, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1217

✅ You can merge normally! (1 Unrelated Failure)

vkuzo left a comment

Choose a reason for hiding this comment

gau-nernst commented Nov 8, 2024

msaroufim commented Nov 8, 2024

gau-nernst commented Nov 8, 2024

msaroufim commented Nov 8, 2024 • edited Loading

malfet commented Nov 8, 2024

nighting0le01 Dec 5, 2024 • edited Loading

Choose a reason for hiding this comment

gau-nernst Dec 6, 2024

Choose a reason for hiding this comment

[Low-bit optim] Support for `dcp.save()` and `dcp.load()` #1217

[Low-bit optim] Support for `dcp.save()` and `dcp.load()` #1217

gau-nernst commented Nov 3, 2024 •

edited

Loading

pytorch-bot bot commented Nov 3, 2024 •

edited

Loading

msaroufim commented Nov 8, 2024 •

edited

Loading

nighting0le01 Dec 5, 2024 •

edited

Loading