-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Composability with sparse and quantization compressors #948
Conversation
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. |
src/llmcompressor/transformers/compression/quantization_format.py
Outdated
Show resolved
Hide resolved
9249158
to
afc0b5f
Compare
verified decompression works for sparse and quantized model |
8c3b515
to
4043b65
Compare
d1dd1d6
to
98cc518
Compare
src/llmcompressor/transformers/sparsification/compressed_tensors_utils.py
Outdated
Show resolved
Hide resolved
src/llmcompressor/transformers/sparsification/compressed_tensors_utils.py
Outdated
Show resolved
Hide resolved
c2c332a
to
ad7c768
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I have a few follow-up questions that I've noted. We can discuss during work session
Add case in the description where save_compressed is True but unstructured sparsity; which quantized compressor. |
Increase Sparsity Threshold Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Please update the table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran a ton of cases and everything seems well covered and integrated well with vLLM.
Few small comments
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py
Show resolved
Hide resolved
Done! |
Add Table in docstring Add test for compressor inference
52d0f7e
to
2c51f2d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work!
@rahul-tuli infer_quantization_config had a signature change which is now failing tests |
4038e72
to
7a8a7b6
Compare
tokenizer.save_pretrained(save_dir) | ||
``` | ||
|
||
> **Note:** This will compress the model using the quantization compressor; however, instead of using the optimal sparsity compressor, the dense sparsity compressor will be used. This affects only how the model is saved on disk and does not change the actual pruning/quantization process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
dense sparsity compressor is a kind of weird/confusing concept and could use some clarification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now once the tests are green
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Failing tests, are unrelated to this PR and also fail on main |
This PR addresses two key updates: 1. **Test Update**: In [PR #948](#948), a flag name was updated during the review process. However, this update wasn't reflected in the relevant test. This PR propagates the updated flag name to the associated test. 2. **Sparsity Threshold Adjustment**: As requested in [PR #948](#948), the sparsity threshold has been reduced to `0.49`.
This PR enables accomplishes the following:
Needs: neuralmagic/compressed-tensors#241
Choices of Compressor's for different cases:
Explanation
quantization_format
,compressed_flag
, and sparsity structure.Notes
global_sparsity
) is less than0.05
, no compression configuration is returned.CompressionFormat.marlin_24
, the compression format defaults todense
regardless of sparsity structure.compressed_flag
isTrue
and the sparsity structure isTWO_FOUR
, the compression format issparse_24_bitmask
. (This is the only sparse compressor supported in vllm)dense
.SparsityThreshold
(default0.5
) is used to determine targets and ignores in the model by evaluating parameter sparsity.Additional Information
SparsityThreshold
.SparsityThreshold
.infer_sparse_targets_and_ignores
function, considering the sparsity structure and threshold.