coremltools 8.0b1
Pre-release
Pre-release
For all the new features, find the updated documentation in the docs-guides
- New utilities
coremltools.utils.MultiFunctionDescriptor()
andcoremltools.utils.save_multifunction
, for creating anmlprogram
with multiple functions in it, that can share weights. Updated the model loading API to load specific functions for prediction. - Stateful Core ML models: updates to the converter to produce Core ML models with the State Type (new type introduced in iOS18/macOS15).
coremltools.optimize
- Updates to model representation (
mlprogram
) pertaining to compression:- Support compression with more granularities: blockwise quantization, grouped channel wise palettization
- 4 bit weight quantization (in addition to 8 bit quantization that was already supported)
- 3 bit palettization (in addition to 1,2,4,6,8 bit palettization that was already supported)
- Support joint compression modes:
- 8 bit Look-up-tables for palettization
- ability to combine weight pruning and palettization
- ability to combine weight pruning and quantization
- API updates:
coremltools.optimize.coreml
- Updated existing APIs to account for features mentioned above
- Support joint compression by applying compression techniques on an already compressed model
- A new API to support activation quantization using calibration data, which can be used to take a W16A16 Core ML model and produce a W8A8 model:
ct.optimize.coreml.experimental.linear_quantize_activations
- (to be upgraded from experimental to the official name space in a future release)
coremltools.optimize.torch
- Updated existing APIs to account for features mentioned above
- Added new APIs for data free compression (
PostTrainingPalettizer
,PostTrainingQuantizer
- Added new APIs for calibration data based compression (
SKMPalettizer
for sensitive k-means palettization algorithm,layerwise_compression
for GPTQ/sparseGPT quantization/pruning algorithm) - Updated the APIs + the
coremltools.convert
implementation, so that for converting torch models compressed withct.optimize.torch
, there is no longer a need to provide additional pass pipeline arguments.
- Updates to model representation (
- iOS18 / macOS15 ops
- compression related ops:
constexpr_blockwise_shift_scale
,constexpr_lut_to_dense
,constexpr_sparse_to_dense
, etc - updates to the GRU op
- PyTorch op
scaled_dot_product_attention
- compression related ops:
- Experimental
torch.export
conversion support
import torch
import torchvision
import coremltools as ct
torch_model = torchvision.models.vit_b_16(weights="IMAGENET1K_V1")
x = torch.rand((1, 3, 224, 224))
example_inputs = (x,)
exported_program = torch.export.export(torch_model, example_inputs)
coreml_model = ct.convert(exported_program)
- Various other bug fixes, enhancements, clean ups and optimizations
Known Issues
- Conversion will fail when using certain palettization modes (e.g. int8 LUT, vector palettization) with torch models using
ct.optimize.torch
- Some of the joint compression modes when used with the training time APIs in
ct.optimize.torch
will result in a torch model that is not correctly converted - The post-training palettization config for mlpackage models (
ct.optimize.coreml.``OpPalettizerConfig
) does not yet have all the arguments that are supported in thecto.torch.palettization
APIs (e.g.lut_dtype
(to get int8 dtyped LUT),cluster_dim
(to do vector palettization),enable_per_channel_scale
(to apply per-channel-scale) etc). - Applying symmetric quantization using GPTQ algorithm with
ct.optimize.torch.layerwise_compression.LayerwiseCompressor
will not produce the correct quantization scales, due to a known bug. This may lead to poor accuracy for the quantized model
Special thanks to our external contributors for this release: @teelrabbit @igeni @Cyanosite