Skip to content

coremltools 8.0b1

Pre-release
Pre-release
Compare
Choose a tag to compare
@YifanShenSZ YifanShenSZ released this 10 Jun 19:09
· 56 commits to main since this release
f391218

For all the new features, find the updated documentation in the docs-guides

  • New utilities coremltools.utils.MultiFunctionDescriptor() and coremltools.utils.save_multifunction , for creating an mlprogram with multiple functions in it, that can share weights. Updated the model loading API to load specific functions for prediction.
  • Stateful Core ML models: updates to the converter to produce Core ML models with the State Type (new type introduced in iOS18/macOS15).
  • coremltools.optimize
    • Updates to model representation (mlprogram) pertaining to compression:
      • Support compression with more granularities: blockwise quantization, grouped channel wise palettization
      • 4 bit weight quantization (in addition to 8 bit quantization that was already supported)
      • 3 bit palettization (in addition to 1,2,4,6,8 bit palettization that was already supported)
      • Support joint compression modes:
        • 8 bit Look-up-tables for palettization
        • ability to combine weight pruning and palettization
        • ability to combine weight pruning and quantization
    • API updates:
      • coremltools.optimize.coreml
        • Updated existing APIs to account for features mentioned above
        • Support joint compression by applying compression techniques on an already compressed model
        • A new API to support activation quantization using calibration data, which can be used to take a W16A16 Core ML model and produce a W8A8 model: ct.optimize.coreml.experimental.linear_quantize_activations
          • (to be upgraded from experimental to the official name space in a future release)
      • coremltools.optimize.torch
        • Updated existing APIs to account for features mentioned above
        • Added new APIs for data free compression (PostTrainingPalettizer , PostTrainingQuantizer
        • Added new APIs for calibration data based compression (SKMPalettizer for sensitive k-means palettization algorithm, layerwise_compression for GPTQ/sparseGPT quantization/pruning algorithm)
        • Updated the APIs + the coremltools.convert implementation, so that for converting torch models compressed with ct.optimize.torch , there is no longer a need to provide additional pass pipeline arguments.
  • iOS18 / macOS15 ops
    • compression related ops: constexpr_blockwise_shift_scale, constexpr_lut_to_dense, constexpr_sparse_to_dense, etc
    • updates to the GRU op
    • PyTorch op scaled_dot_product_attention
  • Experimental torch.export conversion support
import torch
import torchvision

import coremltools as ct

torch_model = torchvision.models.vit_b_16(weights="IMAGENET1K_V1")

x = torch.rand((1, 3, 224, 224))
example_inputs = (x,)
exported_program = torch.export.export(torch_model, example_inputs)

coreml_model = ct.convert(exported_program)
  • Various other bug fixes, enhancements, clean ups and optimizations

Known Issues

  • Conversion will fail when using certain palettization modes (e.g. int8 LUT, vector palettization) with torch models using ct.optimize.torch
  • Some of the joint compression modes when used with the training time APIs in ct.optimize.torch will result in a torch model that is not correctly converted
  • The post-training palettization config for mlpackage models (ct.optimize.coreml.``OpPalettizerConfig) does not yet have all the arguments that are supported in the cto.torch.palettization APIs (e.g. lut_dtype (to get int8 dtyped LUT), cluster_dim (to do vector palettization), enable_per_channel_scale (to apply per-channel-scale) etc).
  • Applying symmetric quantization using GPTQ algorithm with ct.optimize.torch.layerwise_compression.LayerwiseCompressor will not produce the correct quantization scales, due to a known bug. This may lead to poor accuracy for the quantized model

Special thanks to our external contributors for this release: @teelrabbit @igeni @Cyanosite