-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
oneDNN v3.7 release notes #2481
base: rls-v3.7
Are you sure you want to change the base?
Changes from 1 commit
23e00b1
8be4815
73e9276
f87e982
6a59eb1
e12db5a
c4cafcd
632296f
128ba81
b1b6fc4
56d1562
b74779a
a5dbb42
60a8ad7
c5c9ce4
cf5db22
31b1170
7fdc7cf
f8305d7
a18e6d0
93a14c9
c4c6e26
358438f
84e671b
ca55afc
c83a60a
1ce6a56
0619d9c
c3676ad
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# Performance Optimizations | ||
## Intel Architecture Processors | ||
tprimak marked this conversation as resolved.
Show resolved
Hide resolved
vgvozdeva marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Improved fp16/bf16 softmax performance with relaxed [accumulation mode](https://oneapi-src.github.io/oneDNN/dev_guide_attributes_accumulation_mode.html#doxid-dev-guide-attributes-accumulation-mode). | ||
* Added support and improved perfomance for fp8 matmul with bf16/fp16. | ||
vgvozdeva marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Intel Graphics Products | ||
vpirogov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Introduced initial optimizations for GPUs based on Xe3 architecture. | ||
* Improved performance for convolution for Intel Arc Graphics for Intel Core Ultra processors (Series 2) (formerly Lunar Lake). | ||
vgvozdeva marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
vgvozdeva marked this conversation as resolved.
Show resolved
Hide resolved
|
||
## AArch64-based Processors | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jondea, @theComputeKid, could you please help summarizing AArch64 improvements? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Sqvid I think you have a list of our improvements? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @vpirogov We had several changes that were merged into the 3.6.2 patch release, but this will be the first major release they are in. Should they be mentioned again or not? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Sqvid, v3.7 release notes summarize the changes in comparison to v3.6, so let's mention these as well. |
||
|
||
# Functionality | ||
* Introduced support for `select` algorithm in binary primitive. The functionality is optimized for Intel CPUs. | ||
* Enabled support for matmul primitive with grouped quantization on weight along N dimension | ||
vgvozdeva marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Introduced support for fp16/bf16 compressed weights in fp32 matmul on Intel CPUs. | ||
* Introduced support for grouped scales and zero points in reorder primitive. | ||
* Enabled support for 4d weight scale in matmul primitive. | ||
vgvozdeva marked this conversation as resolved.
Show resolved
Hide resolved
vgvozdeva marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* [experimental] Extended microkernel API: | ||
Introduced int4 quantization support. | ||
Fpmath mode API | ||
vpirogov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Usability | ||
* Relaxed memory object lifetime requirements created with CPU engine and SYCL runtime. New behavior is aligned with GPU engine. | ||
vgvozdeva marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Improve verbose diagnostic to better identify issues during dispatching, primitive and kernel creation for CPU primitive and GPU (in case of OpenCL implementation) primitive implementations. | ||
vgvozdeva marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Improve verbose diagnostic to simplify debugging of nGEN fallbacks. | ||
vgvozdeva marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Enabled frame pointers support on Intel64 platforms to improve integration with profilers. | ||
vgvozdeva marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Validation | ||
* Extended benchdnn with support and validation for fp8 matmul patterns for tensor tags in RNN primitive validation. | ||
vgvozdeva marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Deprecated Functionality | ||
|
||
# Breaking Changes | ||
* Updated minimal supported CMake version to 3.13 (was 2.8.12). | ||
* Updated minimal supported GCC version to 8.0 (was 4.8). | ||
* Updated minimal supported Clang version to 11.0 (was 3.0). | ||
vgvozdeva marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Thanks to these Contributors | ||
|
||
This release contains contributions from the [project core team] as well as Michał Górny @mgorny, Fadi Arafeh @fadara01, John Osorio @kala855, Ravi Pushkar @rpushkarr, Marek Michalowski @michalowski-arm, Renato Barros Arantes @renato-arantes, Ryo Suzuki @Ryo-not-rio, Varad Ahirwadkar @varad-ahirwadkar, Tadej Ciglarič @t4c1, Nikhil Sharma @nikhilfujitsu, @taoye9, @Shreyas-fuj, @raistefintel. We would also like to thank everyone who asked questions and reported issues. | ||
vgvozdeva marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
[project core team]: https://github.com/oneapi-src/oneDNN/blob/rls-v3.7/MAINTAINERS.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgeor255, @ShanoToni, @t4c1, @Rbiessy, could you please help with release notes content for NVIDIA backend and generic SYCL kernels?
We are primarily looking for two things: performance improvements (stuff that works faster) and new features (stuff that did not work before).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vpirogov. We had a look but no new features nor performance improvements made it in the 3.7 branch. We'll have a few new features for the next release. @sgeor255 approved the PR as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be specific, these release notes cover changes from oneDNN v3.6 code freeze (Sept 6, 2024) up until today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything from this list for changes worth calling out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed your answer. We got confused with what made it to the 3.7 branch. I have added a suggestion in the "Functionality" section!