SparseML Compression Pt 2: Load compressed weights #2184

Satrat · 2024-03-15T15:14:33Z

This PR implements ModelCompressor.decompress(), which will decompress the weights in the safetensors file one by one. Also includes a bunch of helper functions for reading safetensors files and dealing with the compressed format. See the corresponding internal docs PR for design details

Note: #2177 needs to be merged first

To be implemented in follow-up PR

inferring sparsity config from model
SparseAutoModel save/load interface

Example

Sample code for compressing a model with 50% sparsity(See PR #2177), then reloading the compressed weights as a dense model

from sparseml.transformers import SparseAutoModelForCausalLM
from sparseml.transformers.compression import BitmaskConfig, BitmaskCompressor
from sparseml.utils.pytorch.utils import measure_cuda_memory
from tqdm import tqdm
import torch

MODEL_PATH = "zoo:llama2-7b-gsm8k_llama2_pretrain-pruned50.oneshot"
OUTPUT_PATH = "./test_compress_output"

torch.cuda.set_device(0)
with measure_cuda_memory() as m:
    model = SparseAutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="cuda:0")
print(f"Load dense model peak GPU {m.overall_peak_memory / float(2**30):.4f} GB")

sparsity_config = BitmaskConfig()
compressor = BitmaskCompressor(config=sparsity_config)

# compresses the model using Bitmask compression
with measure_cuda_memory() as m:
    model_state_dict = model.state_dict()
    sparse_state_dict = compressor.compress(model_state_dict)

    # save the compressed model
    model.save_pretrained(
        OUTPUT_PATH, 
        safe_serialization=True, 
        state_dict=sparse_state_dict
    )

print(f"Save compressed model peak GPU {m.overall_peak_memory / float(2**30):.4f} GB")

# use the dense state dict to reload the model
torch.cuda.set_device(1)
with measure_cuda_memory() as m:
    model_again = SparseAutoModelForCausalLM.from_pretrained(
        OUTPUT_PATH, 
        device_map="cuda:1"
    )

    #returns iterator
    dense_state_dict = compressor.decompress(OUTPUT_PATH)
    for name, data in tqdm(dense_state_dict, desc="Decompressing model"):
        BitmaskCompressor.replace_layer(name, data, model_again)

print(f"Load compressed model peak GPU {m.overall_peak_memory / float(2**30):.4f} GB")

Load dense model peak GPU 25.2276 GB
Compressing model: 100%|████████████████████████████████████████████████████████████████████████████████████████| 291/291 [01:28<00:00, 3.29it/s]
Save compressed model peak GPU 25.2276 GB
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 6.27it/s]
Decompressing model: 291it [01:11, 4.08it/s]
Load compressed model peak GPU 25.7159 GB

…into tensor_compression

Co-authored-by: dbogunowicz <[email protected]>

The base branch was changed.

mgoin

LGTM thanks, just one line I think was missed

src/sparseml/transformers/compression/compressors/sparse_bitmask.py

mgoin

thanks!

Sara Adkins added 19 commits March 12, 2024 14:57

initial classes

45a16ed

WIP

a7cee23

compression working

e1549e8

unit tests and README

92ba386

docstrings

f061d78

README and fix test

40a75a9

add bitmask source

522813c

Merge branch 'main' into tensor_compression

c6d0b4d

initial commit

118c223

compression working

be2223f

formatting

c07d36a

cleanup

d2a8a78

dtype tests

1096700

Merge branch 'main' into tensor_compression

1749b28

oops fix test

013d17b

Merge branch 'tensor_compression' of github.com:neuralmagic/sparseml …

813c8e7

…into tensor_compression

tests

41223bb

add bfloat16

2c6eeba

Merge branch 'tensor_compression' into tensor_decompression

2d515fa

Satrat changed the title ~~[Draft] SparseML Compression Pt 1: Load compressed weights~~ [Draft] SparseML Compression Pt 2: Load compressed weights Mar 15, 2024

Sara Adkins added 2 commits March 15, 2024 15:39

unit tests

35e1dba

docstrings

dd9d82f

Satrat changed the title ~~[Draft] SparseML Compression Pt 2: Load compressed weights~~ SparseML Compression Pt 2: Load compressed weights Mar 15, 2024

Satrat marked this pull request as ready for review March 15, 2024 15:59

Satrat requested review from mgoin, bfineran, dsikka, horheynm, dbogunowicz and rahul-tuli March 15, 2024 15:59

Sara Adkins added 5 commits March 19, 2024 16:21

expand cuda memory helper

d81dd5c

update example and tests

bb973c2

update docstrings

a6b83da

remove unneeded file

aee4575

Merge branch 'tensor_compression' into tensor_decompression

e0fe017

Satrat requested a review from mgoin March 20, 2024 01:56

Satrat mentioned this pull request Mar 20, 2024

SparseML Compression Pt 3: SparseAutoModel interface & inferring params #2190

Merged

Sara Adkins and others added 4 commits March 20, 2024 11:31

Update src/sparseml/pytorch/model_load/helpers.py

1ba94ed

Co-authored-by: dbogunowicz <[email protected]>

Update README.md

e5e1215

Merge branch 'main' into tensor_compression

3958525

Merge branch 'tensor_compression' into tensor_decompression

38e3c6d

Base automatically changed from tensor_compression to main March 20, 2024 17:13

Satrat requested review from dbogunowicz and bfineran March 20, 2024 17:13

Merge branch 'main' into tensor_decompression

830a9f5

bfineran previously approved these changes Mar 20, 2024

View reviewed changes

Merge branch 'main' into tensor_decompression

7fcd5c3

dbogunowicz previously approved these changes Mar 20, 2024

View reviewed changes

mgoin reviewed Mar 20, 2024

View reviewed changes

src/sparseml/transformers/compression/compressors/sparse_bitmask.py Outdated Show resolved Hide resolved

fix merge conflict error

0335095

Satrat dismissed stale reviews from dbogunowicz and bfineran via 0335095 March 20, 2024 20:35

Satrat requested review from mgoin, bfineran and dbogunowicz March 20, 2024 20:35

mgoin approved these changes Mar 20, 2024

View reviewed changes

mgoin merged commit 121d7fe into main Mar 20, 2024
13 of 14 checks passed

mgoin deleted the tensor_decompression branch March 20, 2024 21:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SparseML Compression Pt 2: Load compressed weights #2184

SparseML Compression Pt 2: Load compressed weights #2184

Satrat commented Mar 15, 2024 •

edited

Loading

mgoin left a comment

mgoin left a comment

SparseML Compression Pt 2: Load compressed weights #2184

SparseML Compression Pt 2: Load compressed weights #2184

Conversation

Satrat commented Mar 15, 2024 • edited Loading

To be implemented in follow-up PR

Example

mgoin left a comment

Choose a reason for hiding this comment

mgoin left a comment

Choose a reason for hiding this comment

Satrat commented Mar 15, 2024 •

edited

Loading