Skip to content

Latest commit

 

History

History
365 lines (284 loc) · 15.2 KB

CHANGELOG.md

File metadata and controls

365 lines (284 loc) · 15.2 KB

Changelog for MIGraphX

Full documentation for MIGraphX is available at https://rocmdocs.amd.com/projects/AMDMIGraphX/en/latest/.

MIGraphX 2.11 for ROCm 6.3.0

Added

  • Initial code to run on Windows
  • Support for gfx120x GPU
  • Support for FP8, and INT4
  • Support for the Log2 internal operator
  • Support for the GCC 14 compiler
  • The BitwiseAnd, Scan, SoftmaxCrossEntropyLoss, GridSample, and NegativeLogLikelihoodLoss ONNX operators
  • The MatMulNBits, QuantizeLinear/DequantizeLinear, GroupQueryAttention, SkipSimplifiedLayerNormalization, and SimpliedLayerNormalization Microsoft Contrib operators
  • Dymamic batch parameter support to OneHot operator
  • Split-K as an optional performance improvement
  • Scripts to validate ONNX models from the ONNX Model Zoo
  • GPU Pooling Kernel
  • --mlir flag to the migraphx-driver program to offload entire module to mlir
  • Fusing split-reduce with MLIR
  • Multiple outputs for the MLIR + Pointwise fusions
  • Pointwise fusions with MLIR across reshape operations
  • MIGRAPHX_MLIR_DUMP environment variable to dump MLIR modules to MXRs
  • The 3 option to MIGRAPHX_TRACE_BENCHMARKING to print the MLIR program for improved debug output
  • MIGRAPHX_ENABLE_HIPBLASLT_GEMM environment variable to call hipBlasLt libaries
  • MIGRAPHX_VERIFY_DUMP_DIFF to improve the debugging of accuracy issues
  • reduce_any and reduce_all options to the Reduce operation via Torch MIGraphX
  • Examples for RNNT, and ControlNet

Changed

  • Switched to MLIR's 3D Convolution operator.
  • MLIR is now used for Attention operations by default on gfx942 and newer ASICs.
  • Names and locations for VRM specific libraries have changed.
  • Use random mode for benchmarking GEMMs and convolutions.
  • Python version is now printed with an actual version number.

Removed

  • Disabled requirements for MIOpen and rocBlas when running on Windows.
  • Removed inaccuracte warning messages when using exhaustive-tune.
  • Remove the hard coded path in MIGRAPHX_CXX_COMPILER allowing the compiler to be installed in different locations.

Optimized

  • Improved:
    • Infrastructure code to enable better Kernel fusions with all supported data types
    • Subsequent model compile time by creating a cache for already performant kernels
    • Use of Attention fusion with models
    • Performance of the Softmax JIT kernel and of the Pooling opterator
    • Tuning operations through a new 50ms delay before running the next kernel
    • Performance of several convolution based models through an optimized NHWC layout
    • Performance for the FP8 datatype
    • GPU utilization
    • Verification tools
    • Debug prints
    • Documentation, including gpu-driver utility documentation
    • Summary section of the migrahx-driver perf command
  • Reduced model compilation time
  • Reordered some compiler passes to allow for more fusions
  • Preloaded tiles into LDS to improve performance of pointwise transposes
  • Exposed the external_data_path property in onnx_options to set the path from onnxruntime

Resolved Issues

  • Fixed a bug with gfx1030 that overwrote dpp_reduce.
  • Fixed a bug in 1arg dynamic reshape that created a failure.
  • Fixed a bug with dot_broadcast and inner_broadcast that caused compile failures.
  • Fixed a bug where some configs were failing when using exhaustive-tune.
  • Fixed the ROCM Install Guide URL.
  • Fixed an issue while building a whl package due to an apostrophe.
  • Fixed the BERT Squad example requirements file to support different versions of Python.
  • Fixed a bug that stopped the Vicuna model from compiling.
  • Fixed failures with the verify option of migraphx-driver that would cause the application to exit early.

MIGraphX 2.10 for ROCm 6.2.0

Additions

  • Added support for ONNX Runtime MIGraphX EP on Windows
  • Added FP8 Python API
  • Added examples for SD 2.1 and SDXL
  • Improved Dynamic Batch to support BERT
  • Added a --test flag in migraphx-driver to validate the installation
  • Added support for ONNX Operator: Einsum
  • Added uint8 support in ONNX Operators
  • Added fusion for group convolutions
  • Added rocMLIR conv3d support
  • Added rocgdb to the Dockerfile

Optimizations

  • Improved ONNX Model Zoo coverage
  • Reorganized memcpys with ONNX Runtime to improve performance
  • Replaced scaler multibroadcast + unsqueeze with just a multibroadcast
  • Improved MLIR kernel selection for multibroadcasted GEMMs
  • Improved details of the perf report
  • Enable mlir by default for GEMMs with small K
  • Allow specifying dot or convolution fusion for mlir with environmental flag
  • Improve performance on small reductions by doing multiple reduction per wavefront
  • Add additional algebraic simplifications for mul-add-dot sequence of operations involving constants
  • Use MLIR attention kernels in more cases
  • Enables MIOpen and CK fusions for MI300 gfx arches
  • Support for QDQ quantization patterns from Brevitas which have explicit cast/convert nodes before and after QDQ pairs
  • Added Fusion of "contiguous + pointwise" and "layout + pointwise" operations which may result in performance gains in certain cases
  • Added Fusion for "pointwise + layout" and "pointwise + contiguous" operations which may result in performance gains when using NHWC layout
  • Added Fusion for "Pointwise + concat" operation which may help in performance in certain cases
  • Fixes a bug in "concat + pointwise" fusion where output shape memory layout wasn't maintained
  • Simplifies "slice + concat" pattern in SDXL UNet
  • eliminates ZeroPoint/Shift in QuantizeLinear or DeQuantizeLinear ops if zero points values are zeros
  • Improved inference performance by fusing Reduce to Broadcast
  • Added additional information when printing the perf report
  • Improve scalar fusions when not all strides are 0
  • Added support for multi outputs in pointwise ops
  • Improve reduction fusion with reshape operators
  • Use the quantized output when an operator is used again

Fixes

  • Super Resolution model verification failed with FP16
  • Suppressed confusing messages when compiling the model
  • Mod operator failed to compile with int8 and int32 inputs
  • Prevented spawning too many threads for constant propagation when parallel STL is not enabled
  • Fixed a bug when running migraphx-driver with the --run 1 option
  • Layernorm Accuracy fix: calculations in FP32
  • Update Docker generator script to ROCm 6.1 to point at Jammy
  • Floating Point exception fix for dim (-1) in reshape operator
  • Fixed issue with int8 accuracy and models which were failing due to requiring a fourth bias input
  • Fixed missing inputs not previously handled for quantized bias for the weights, and data values of the input matrix
  • Fixed order of operations for int8 quantization which were causing inaccuracies and slowdowns
  • Removed list initializer of prefix_scan_sum which was causing issues during compilation and resulting in the incorrect constructor to be used at compile
  • Fixed the MIGRAPHX_GPU_COMPILE_PARALLEL flag to enable users to control number of threads used for parallel compilation

Changes

  • Changed default location of libraries with release specific ABI changes
  • Reorganized documentation in GitHub

Removals

  • Removed the --model flag with migraphx-driver

MIGraphX 2.9 for ROCm 6.1.0

Additions

  • Added beta version of FP8, functional, not performant
  • Created a dockerfile with MIGraphX+ONNX Runtime EP+Torch
  • Added support for the Hardmax, DynamicQuantizeLinear, Qlinearconcat, Unique, QLinearAveragePool, QLinearSigmoid, QLinearLeakyRelu, QLinearMul, IsInf operators
  • Created web site examples for Whisper, Llama-2, and Stable Diffusion 2.1
  • Created examples of using the ONNX Runtime MIGraphX Execution Provider with the InceptionV3 and Resnet50 models
  • Updated operators to support ONNX Opset 19
  • Enable fuse_pointwise and fuse_reduce in the driver
  • Add support for dot-(mul)-softmax-dot offloads to MLIR
  • Added Blas auto-tuning for GEMMs
  • Added dynamic shape support for the multinomial operator
  • Added fp16 to accuracy checker
  • Added initial code for running on Windows OS

Optimizations

  • Improved the output of migraphx-driver command
  • Documentation now shows all environment variables
  • Updates needed for general stride support
  • Enabled Asymmetric Quantization
  • Added ScatterND unsupported reduction modes
  • Rewrote softmax for better performance
  • General improvement to how quantization is performed to support INT8
  • Used problem_cache for gemm tuning
  • Improved performance by always using rocMLIR for quantized convolution
  • Improved group convolutions by using rocMLIR
  • Improved accuracy of fp16 models
  • ScatterElements unsupported reduction
  • Added concat fusions
  • Improved INT8 support to include UINT8
  • Allow reshape ops between dq and quant_op
  • Improve dpp reductions on navi
  • Have the accuracy checker print the whole final buffer
  • Added support for handling dynamic Slice and ConstantOfShape ONNX operators
  • Add support for the dilations attribute to Pooling ops
  • Add layout attribute support for LSTM operator
  • Improved performance by removing contiguous for reshapes
  • Handle all slice input variations
  • Add scales attribute parse in upsample for older opset versions
  • Added support for uneven Split operations
  • Improved unit testing to run in python virtual environments

Fixes

  • Fixed outstanding issues in autogenerated documentation
  • Update model zoo paths for examples
  • Fixed promote_literals_test by using additional if condition
  • Fixed export API symbols from dynamic library
  • Fixed bug in pad operator from dimension reduction
  • Fixed using the LD to embed files and enable by default when building shared libraries on linux
  • fixed get_version()
  • Fixed Round operator inaccuracy
  • Fixed wrong size check when axes not present for slice
  • Set the .SO version correctly

Changes

  • Cleanup LSTM and RNN activation functions
  • Placed gemm_pointwise at a higher priority than layernorm_pointwise
  • Updated README to mention the need to include GPU_TARGETS when building MIGraphX

Removals

  • Removed unused device kernels from Gather and Pad operators
  • Removed int8x4 format

MIGraphX 2.8 for ROCm 6.0.0

Additions

  • Support for MI300 GPUs
  • Support for TorchMIGraphX via PyTorch
  • Boosted overall performance by integrating rocMLIR
  • INT8 support for ONNX Runtime
  • Support for ONNX version 1.14.1
  • Added new operators: Qlinearadd, QlinearGlobalAveragePool, Qlinearconv, Shrink, CastLike, and RandomUniform
  • Added an error message for when gpu_targets is not set during MIGraphX compilation
  • Added parameter to set tolerances with migraphx-driver verify
  • Added support for MXR files > 4 GB
  • Added MIGRAPHX_TRACE_MLIR flag
  • BETA added capability for using ROCm Composable Kernels via the MIGRAPHX_ENABLE_CK=1 environment variable

Optimizations

  • Improved performance support for INT8
  • Improved time precision while benchmarking candidate kernels from CK or MLIR
  • Removed contiguous from reshape parsing
  • Updated the ConstantOfShape operator to support Dynamic Batch
  • Simplified dynamic shapes-related operators to their static versions, where possible
  • Improved debugging tools for accuracy issues
  • Included a print warning about miopen_fusion while generating mxr
  • General reduction in system memory usage during model compilation
  • Created additional fusion opportunities during model compilation
  • Improved debugging for matchers
  • Improved general debug messages

Fixes

  • Fixed scatter operator for nonstandard shapes with some models from ONNX Model Zoo
  • Provided a compile option to improve the accuracy of some models by disabling Fast-Math
  • Improved layernorm + pointwise fusion matching to ignore argument order
  • Fixed accuracy issue with ROIAlign operator
  • Fixed computation logic for the Trilu operator
  • Fixed support for the DETR model

Changes

  • Changed MIGraphX version to 2.8
  • Extracted the test packages into a separate deb file when building MIGraphX from source

Removals

  • Removed building Python 2.7 bindings

MIGraphX 2.7 for ROCm 5.7.0

Additions

  • hipRTC no longer requires dev packages for MIGraphX runtime and allows the ROCm install to be in a different directory than build time
  • Added support for multi-target execution
  • Added Dynamic Batch support with C++/Python APIs
  • Added migraphx.create_argument to Python API
  • Added dockerfile example for Ubuntu 22.04
  • Added TensorFlow supported ops in driver similar to exist onnx operator list
  • Added a MIGRAPHX_TRACE_MATCHES_FOR env variable to filter the matcher trace
  • Improved debugging by printing max,min,mean and stddev values for TRACE_EVAL = 2
  • You can now use the fast_math flag instead of ENV for GELU
  • Print message from driver if offload copy is set for compiled program

Optimizations

  • Optimized for ONNX Runtime 1.14.0
  • Improved compile times by only building for the GPU on the system
  • Improved performance of pointwise/reduction kernels when using NHWC layouts
  • Loaded specific version of the migraphx_py library
  • Annotated functions with the block size so the compiler can do a better job of optimizing
  • Enabled reshape on nonstandard shapes
  • Used half HIP APIs to compute max and min
  • Added support for broadcasted scalars to unsqueeze operator
  • Improved multiplies with dot operator
  • Handled broadcasts across dot and concat
  • Added verify namespace for better symbol resolution

Fixes

  • Resolved accuracy issues with FP16 resnet50
  • Updated cpp generator to handle inf from float
  • Fixed assertion error during verify and made DCE work with tuples
  • Fixed convert operation for NaNs
  • Fixed shape typo in API test
  • Fixed compile warnings for shadowing variable names
  • Added missing specialization for the nullptr hash function

Changees

  • Bumped version of half library to 5.6.0
  • Bumped CI to support ROCm 5.6
  • Made building tests optional
  • Replaced np.bool with bool per NumPy request

Removals

  • Removed int8x4 rocBlas calls due to deprecation
  • Removed std::reduce usage because not all operating systems support it

MIGraphX 2.5 for ROCm 5.5.0

Additions

  • Y-Model feature will store tuning information with the optimized model
  • Added Python 3.10 bindings
  • Accuracy checker tool based on ONNX runtime
  • ONNX operators parse_split, and Trilu
  • Build support for ROCm MLIR
  • Added the migraphx-driver flag to print optimizations in Python (--python)
  • Added JIT implementation of the Gather and Pad operators, which results in better handling for larger tensor sizes

Optimizations

  • Improved performance of Transformer-based models
  • Improved performance of the Pad, Concat, Gather, and Pointwise operators
  • Improved ONNX/pb file loading speed
  • Added a general optimize pass that runs several passes, such as simplify_reshapes, algebra, and DCE in a loop

Fixes

  • Improved parsing for TensorFlow Protobuf files
  • Resolved various accuracy issues with some ONNX models
  • Resolved a gcc-12 issue with MIVisionX
  • Improved support for larger sized models and batches
  • Use --offload-arch instead of --cuda-gpu-arch for the HIP compiler
  • Changes inside JIT to use float accumulator for large reduce ops of half type to avoid overflow
  • Changes inside JIT to temporarily use cosine to compute sine function

Changes

  • Changed version and location of third-party build dependencies in order to pick up fixes