Releases: neuralmagic/compressed-tensors
Releases · neuralmagic/compressed-tensors
Compressed Tensors v0.8.1
What's Changed
- Skip accelerate tests by @kylesayrs in #208
- Remove QuantizationScheme.default_scheme by @kylesayrs in #202
- Allow ModelCompressor.from_pretrained to load from quantization_config, not compression config by @horheynm in #207
- Quantization Scheme Validation by @kylesayrs in #209
- Fix uninitialized variable in quantized compressors by @markmc in #205
- Implement aliasable mixin and alias activation ordering by @kylesayrs in #213
- Revert "Implement aliasable mixin and alias activation ordering (#213)" by @dsikka in #217
- Implement aliasable mixin and alias activation ordering (python3.9 fix) by @kylesayrs in #218
- bump by @dsikka in #226
New Contributors
Full Changelog: 0.8.0...0.8.1
Compressed Tensors v0.8.0
What's Changed
- [Observer Restructure]: Separate out scale/zp and observer init; separate out calibration from forward pass by @dsikka in #188
- Fix device allocation for MSE observer by @anmarques in #190
- drop 3.8 and add 3.12 to testing by @dhuangnm in #196
- Fix test which required accelerate, apply style by @kylesayrs in #194
- [Bugfix] Move observer and g_idx until after module in onloaded by @kylesayrs in #195
- Add sparsity structure enum by @rahul-tuli in #197
- Observer Restructure: Remove Observers,
calibration
, and applyingfrozen
steps from lifecycle by @dsikka in #189 - Clean up observer defaulting logic, better error message by @kylesayrs in #200
- apply style and quality by @kylesayrs in #201
- [BugFix] Fix Marlin24 Bug by @dsikka in #203
- Bump version to v0.8.0 by @dsikka in #204
New Contributors
- @anmarques made their first contribution in #190
Full Changelog: 0.7.1...0.8.0
Compressed Tensors v0.7.1
What's Changed
- [Observer Restructure]: Remove MemoryLess Observer; use helper function for dynamic quantization by @dsikka in #187
- bump up to 0.7.1 for patch release by @dhuangnm in #192
Full Changelog: 0.7.0...0.7.1
Compressed Tensors v0.7.0
What's Changed
- Make INT8 activation PRESET_SCHEMES explicit by @mgoin in #158
- Write the current version into model configs by @mgoin in #160
- [KV-Cache] Make k_scale, v_scale as attributes of self_attn using HFCache by @horheynm in #148
- [Bugfix] Fix quant config parsing by @kylesayrs in #162
- Ignore Dense sparsity config by @rahul-tuli in #169
- fix bug by @horheynm in #170
- Replace
compression_config
to bequantization_config
forHFQuantizer
support by @dsikka in #164 - ignore list by @horheynm in #171
- switch default to release and disable pushing to pypi for now by @dhuangnm in #175
- Fix missing quant_method value by @kylesayrs in #174
- Fix ModelCompressor parsing in HF Quantizer case by @kylesayrs in #176
- Calibration Code Clarity by @kylesayrs in #168
- Add: base sparsity/quantization compressors by @rahul-tuli in #165
- Update compressors folder structure by @rahul-tuli in #166
- Update number of groups by @dsikka in #178
- Bring nightly build/test back by @dhuangnm in #179
- Remove unused function by @kylesayrs in #156
- Revert "Ignore Dense sparsity config (#169)" by @rahul-tuli in #181
- Workaround HF Quantizer
apply_quantization_config
misuse by @kylesayrs in #180 - bump up version to 0.7.0 by @dhuangnm in #186
Full Changelog: 0.6.0...0.7.0
Compressed Tensors v0.6.0
What's Changed
- Add simple GHA workflow to run tests by @dbogunowicz in #2
- Define BaseModels for Quantization by @Satrat in #3
- Quantization refactor by @horheynm in #5
- Apply quantization config implementation by @bfineran in #4
- decorate fake quant with torch.no_grad by @bfineran in #8
- fix observer bugs by @bfineran in #9
- [lifecycle] docstrings + ux update to work with torch.apply by @bfineran in #11
- Fix Device Mismatch by @Satrat in #12
- Serialize Config from Model by @Satrat in #7
- [Observers] pull shared logic into a helper function by @bfineran in #13
- Rename the repo to
compressed-tensors
by @dbogunowicz in #14 - fix style post rename PR by @bfineran in #25
- Quantization Examples and Correctness Fixes by @Satrat in #26
- Fix failing GHA by @dbogunowicz in #29
- Pretrained Model Reload + SparseGPT Support by @Satrat in #31
- [Release 0.3.0] Basic Readme and user-facing pathways by @dbogunowicz in #30
- Quantization Fixes by @Satrat in #35
- Final details for package by @mgoin in #36
- bump version to 0.3.1 license an packaging updates by @bfineran in #37
- Dyanmic Quantization by @bfineran in #15
- [Release 0.3.2] Additional patches to enable compatibility with SparseML, UX changes by @Satrat in #43
- Update target match conditions; make public by @dsikka in #44
- [Lifecycle][Tests] Feature Branch by @horheynm in #38
- [Observers] group size + channel wise + per token by @horheynm in #32
- [BugFix] Update code to be compatible with py38 by @rahul-tuli in #48
- [Fix] Fix the messed-up test structure by @dbogunowicz in #49
- Bump the version before the release by @dbogunowicz in #50
- Compressed lifecycle implementation (INT8 only) by @bfineran in #33
- group size speedups + fixes by @bfineran in #51
- Group and Channelwise Compression Support by @Satrat in #52
- Int4 Packed Compressor by @Satrat in #47
- Fix for auto device map quantization by @Satrat in #54
- Enable generating
compressed-tensors-nightly
by @dbogunowicz in #53 - [BugFix][Again] Update code to be compatible with py38 by @dbogunowicz in #56
- Fix per_token slowdown by @Satrat in #57
- [GPTQ Modifier UX] Add default scheme by @rahul-tuli in #61
- fix group size min max tracking by adding tensor ids by @bfineran in #60
- Support for aliased scheme settings in quant config by @bfineran in #40
- Remove Symmetric Zero Point in Compressed Outputs by @Satrat in #59
- Misc Fixes by @Satrat in #55
- Fix for Symmetric Zero Point Reloading by @Satrat in #64
- Additional Symmetric ZP Fix by @Satrat in #65
- Make ZP int8 instead of int64 by @Satrat in #67
- Add a function to check if a string is a preset scheme by @rahul-tuli in #66
- Rename Packed Weights by @Satrat in #63
- Fixed Grouped Quantization Reload by @Satrat in #68
- Fix incorrect loading of dtype by @eldarkurtic in #70
- Fix Python 3.8 Compatability by @Satrat in #71
- Update nightly build to run at 6pm by @dsikka in #72
- Update time for the runner by @dsikka in #74
- Fixes to enable FSDP one-shot by @dbogunowicz in #58
- Update Compression Config for HfQuantizer Compatability by @Satrat in #73
- Remove version restriction on transformers by @mgoin in #76
- remove pydantic version cap by @bfineran in #80
- reduce appropriate dim by @horheynm in #75
- Marlin24 Compressor by @Satrat in #77
- Fix GPTQ Aliases by @Satrat in #81
- initial fixes for compatibility with HFQuantizer by @bfineran in #79
- bump version to 0.4.0 by @bfineran in #83
- import is_release from version.py by @horheynm in #85
- Add release build workflow by @dhuangnm in #89
- Assert correct device when dequantizing (like we do for quantizing) by @dbogunowicz in #90
- update default symmetry to True on presets by @bfineran in #92
- Fp8 Quantization Support by @Satrat in #62
- default W4A16 alias to use group_size=128 by @bfineran in #94
- [compressor] Add packed int8 support by @dsikka in #91
- Fix Decompress kwargs by @Satrat in #100
- [Quant KV Cache] Implementation by @dbogunowicz in #86
- Fix Transient Tests by @Satrat in #101
- Speed Up Packed Compression by @Satrat in #103
- [Fix] remove
tests/quantization
by @dbogunowicz in #99 - Allow creating compressor when
trust_remote_code=True
by @dbogunowicz in #104 - Update Quantization Presets by @Satrat in #105
- [MOE] Add a set of functionalities to support quantization of MOE models by @dbogunowicz in #46
- [BugFix]Fix Name Mangling Issue in
compressed_tensors.utils
by @rahul-tuli in #102 - Update Quantization Scheme Standards for better readability by @markurtz in #106
- quatization lifecycle - disable forward pass override + helper for weight quant param updates by @bfineran in #111
- Add FP8 Dynamic Scheme for Latest Llama3.1 Meta Models and Fix W4A8 Representation by @markurtz in #114
- Model Offloading Support by @Satrat in #113
- Fix Test to Account for Model Change by @Satrat in #116
- Make publish workflow manually triggerable by @rahul-tuli in #117
- bump version to 0.5.0 by @bfineran in #119
- Fix Execution Device Helper Fn by @Satrat in #120
- Do not mutate config by
apply_quantization_config
by @dbogunowicz in #107 - Rename Quant Method by @Satrat in #122
- Revert Config Change by @Satrat in #124
- Bug Fix for Calibration Setup by @Satrat in https://github.com/neuralmagic/compressed-tenso...
Compressed Tensors v0.5.0
What's Changed
- Add simple GHA workflow to run tests by @dbogunowicz in #2
- Define BaseModels for Quantization by @Satrat in #3
- Quantization refactor by @horheynm in #5
- Apply quantization config implementation by @bfineran in #4
- decorate fake quant with torch.no_grad by @bfineran in #8
- fix observer bugs by @bfineran in #9
- [lifecycle] docstrings + ux update to work with torch.apply by @bfineran in #11
- Fix Device Mismatch by @Satrat in #12
- Serialize Config from Model by @Satrat in #7
- [Observers] pull shared logic into a helper function by @bfineran in #13
- Rename the repo to
compressed-tensors
by @dbogunowicz in #14 - fix style post rename PR by @bfineran in #25
- Quantization Examples and Correctness Fixes by @Satrat in #26
- Fix failing GHA by @dbogunowicz in #29
- Pretrained Model Reload + SparseGPT Support by @Satrat in #31
- [Release 0.3.0] Basic Readme and user-facing pathways by @dbogunowicz in #30
- Quantization Fixes by @Satrat in #35
- Final details for package by @mgoin in #36
- bump version to 0.3.1 license an packaging updates by @bfineran in #37
- Dyanmic Quantization by @bfineran in #15
- [Release 0.3.2] Additional patches to enable compatibility with SparseML, UX changes by @Satrat in #43
- Update target match conditions; make public by @dsikka in #44
- [Lifecycle][Tests] Feature Branch by @horheynm in #38
- [Observers] group size + channel wise + per token by @horheynm in #32
- [BugFix] Update code to be compatible with py38 by @rahul-tuli in #48
- [Fix] Fix the messed-up test structure by @dbogunowicz in #49
- Bump the version before the release by @dbogunowicz in #50
- Compressed lifecycle implementation (INT8 only) by @bfineran in #33
- group size speedups + fixes by @bfineran in #51
- Group and Channelwise Compression Support by @Satrat in #52
- Int4 Packed Compressor by @Satrat in #47
- Fix for auto device map quantization by @Satrat in #54
- Enable generating
compressed-tensors-nightly
by @dbogunowicz in #53 - [BugFix][Again] Update code to be compatible with py38 by @dbogunowicz in #56
- Fix per_token slowdown by @Satrat in #57
- [GPTQ Modifier UX] Add default scheme by @rahul-tuli in #61
- fix group size min max tracking by adding tensor ids by @bfineran in #60
- Support for aliased scheme settings in quant config by @bfineran in #40
- Remove Symmetric Zero Point in Compressed Outputs by @Satrat in #59
- Misc Fixes by @Satrat in #55
- Fix for Symmetric Zero Point Reloading by @Satrat in #64
- Additional Symmetric ZP Fix by @Satrat in #65
- Make ZP int8 instead of int64 by @Satrat in #67
- Add a function to check if a string is a preset scheme by @rahul-tuli in #66
- Rename Packed Weights by @Satrat in #63
- Fixed Grouped Quantization Reload by @Satrat in #68
- Fix incorrect loading of dtype by @eldarkurtic in #70
- Fix Python 3.8 Compatability by @Satrat in #71
- Update nightly build to run at 6pm by @dsikka in #72
- Update time for the runner by @dsikka in #74
- Fixes to enable FSDP one-shot by @dbogunowicz in #58
- Update Compression Config for HfQuantizer Compatability by @Satrat in #73
- Remove version restriction on transformers by @mgoin in #76
- remove pydantic version cap by @bfineran in #80
- reduce appropriate dim by @horheynm in #75
- Marlin24 Compressor by @Satrat in #77
- Fix GPTQ Aliases by @Satrat in #81
- initial fixes for compatibility with HFQuantizer by @bfineran in #79
- bump version to 0.4.0 by @bfineran in #83
- import is_release from version.py by @horheynm in #85
- Add release build workflow by @dhuangnm in #89
- Assert correct device when dequantizing (like we do for quantizing) by @dbogunowicz in #90
- update default symmetry to True on presets by @bfineran in #92
- Fp8 Quantization Support by @Satrat in #62
- default W4A16 alias to use group_size=128 by @bfineran in #94
- [compressor] Add packed int8 support by @dsikka in #91
- Fix Decompress kwargs by @Satrat in #100
- [Quant KV Cache] Implementation by @dbogunowicz in #86
- Fix Transient Tests by @Satrat in #101
- Speed Up Packed Compression by @Satrat in #103
- [Fix] remove
tests/quantization
by @dbogunowicz in #99 - Allow creating compressor when
trust_remote_code=True
by @dbogunowicz in #104 - Update Quantization Presets by @Satrat in #105
- [MOE] Add a set of functionalities to support quantization of MOE models by @dbogunowicz in #46
- [BugFix]Fix Name Mangling Issue in
compressed_tensors.utils
by @rahul-tuli in #102 - Update Quantization Scheme Standards for better readability by @markurtz in #106
- quatization lifecycle - disable forward pass override + helper for weight quant param updates by @bfineran in #111
- Add FP8 Dynamic Scheme for Latest Llama3.1 Meta Models and Fix W4A8 Representation by @markurtz in #114
- Model Offloading Support by @Satrat in #113
- Fix Test to Account for Model Change by @Satrat in #116
- Make publish workflow manually triggerable by @rahul-tuli in #117
- bump version to 0.5.0 by @bfineran in #119
- [Cherry Pick] dont set quantization data on reload (#123) by @Satrat in #125
New Contributors
- @mgoin made their first contribution in #36
- @dsikka made their first contribution in #44
- @rahul-tuli made their first contribution in #48
- @eldarkurtic made their first contribution in https://gith...