-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xe: conv_v2: add Xe2/Xe3 support #2475
base: main
Are you sure you want to change the base?
Conversation
make test |
switch (hw_desc.hw) { | ||
case ngen::HW::XeHPC: | ||
return utils::one_of( | ||
hw.to_ngen(), ngen::HW::Xe2, ngen::HW::Xe3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make compatibility checks against feature requirements (i,e, dpas, 2d send, etc) rather than hardware generations? It generally seems like performant strategies behave well on multiple architectures. By forcing strategies to certain architecture we induces extra tuning work as strategies found on platform X are not available on platform Y. What if instead, we can gate strategies by whether there is a performance model associated with the platform, so that a simple model refitting step can be used to take advantage of work on different platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make compatibility checks against feature requirements (i,e, dpas, 2d send, etc) rather than hardware generations?
I think this step is essentially part of kernel_desc_t::is_supported()
. Any incompatible/invalid kernels are filtered out during selection.
Gating/reusing strategies is a more challenging thing. I don't like the idea to always require refitting as it's an extra burden:
- There are different variations of hardware like integrated/discrete or with different EU counts or different memory configuration. In theory all such platforms may have performance models tuned per-configuration but it's not feasible in practice so instead we group hardware by some characteristics. Grouping Xe2/Xe3 together can be a viable option in this paradigm if Xe2/Xe3 performance numbers correlate well.
- During early enabling (or preSi enabling) we usually don't have hardware with "full" specs so any performance model is not reliable and subject to change significantly. Moreover with preSi enabling we may not have a way to get the needed amount of benchmark data
At the same time mixing performance models built for different platforms does not sound reasonable. Even though we may have good correlation between platforms, they are not calibrated to be comparable. So as a starting point we can follow these rules:
- If platform A doesn't have any kernels with A-specific performance model than we use the closest platform B to evaluate performance of A-compatible kernels.
- This is a low-effort approach to support platforms during early enabling or platforms that are very similar to other platforms
- Once we have at least a single A-specific modeled kernel we look for A-based performance modeled kernels only
Transition between 1 and 2 may be problematic but this approach seems to be a compromise.
ir_check(fit_tag(tensor_kind_t::c, desc, prb, exact)); | ||
ir_check(is_compatible(tensor_kind_t::a, desc, prb, exact)); | ||
ir_check(is_compatible(tensor_kind_t::b, desc, prb, exact)); | ||
ir_check(is_compatible(tensor_kind_t::c, desc, prb, exact)); | ||
ir_check(prb.is_depthwise() == desc.is_dw) | ||
<< "Mixing depthwise/non-depthwise descriptor and problem"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random spot: 2d messages have different alignment requirements on Xe2, but I don't see any check for it. Where is that check happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks. I'll open another PR to refactor block 2D requirements to remove them from the kernel descriptor. This way we can tailor a kernel descriptor to the target GPU platform on the fly.
Jira: https://jira.devtools.intel.com/browse/MFDNN-13005
Reusing XeHPC kernels for Xe2/Xe3.