Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xe: conv_v2: add Xe2/Xe3 support #2475

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

xe: conv_v2: add Xe2/Xe3 support #2475

wants to merge 2 commits into from

Conversation

echeresh
Copy link
Contributor

Jira: https://jira.devtools.intel.com/browse/MFDNN-13005

Reusing XeHPC kernels for Xe2/Xe3.

@echeresh echeresh added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Jan 22, 2025
@echeresh echeresh requested a review from a team as a code owner January 22, 2025 03:17
@echeresh
Copy link
Contributor Author

make test
disable test_device_cpu
disable build_cpu_runtime_omp
disable build_cpu_runtime_sycl
disable build_cpu_runtime_tbb
disable benchdnn_all
enable benchdnn_nightly
enable benchdnn_conv
enable benchdnn_deconv
enable benchdnn_reorder
enable benchdnn_sum
enable arch_gpu_xe-hpc
enable arch_gpu_xe-hpg-atsm
enable arch_gpu_xe-hpg-dg2
enable arch_gpu_xe-lp
enable arch_gpu_xe-lpg
enable arch_gpu_xe-lpg+
enable arch_gpu_xe2-hpg-bmg
enable arch_gpu_xe2-lpg
enable arch_gpu_xe3-lpg

switch (hw_desc.hw) {
case ngen::HW::XeHPC:
return utils::one_of(
hw.to_ngen(), ngen::HW::Xe2, ngen::HW::Xe3);
Copy link
Contributor

@rjoursler rjoursler Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make compatibility checks against feature requirements (i,e, dpas, 2d send, etc) rather than hardware generations? It generally seems like performant strategies behave well on multiple architectures. By forcing strategies to certain architecture we induces extra tuning work as strategies found on platform X are not available on platform Y. What if instead, we can gate strategies by whether there is a performance model associated with the platform, so that a simple model refitting step can be used to take advantage of work on different platforms?

Copy link
Contributor Author

@echeresh echeresh Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make compatibility checks against feature requirements (i,e, dpas, 2d send, etc) rather than hardware generations?

I think this step is essentially part of kernel_desc_t::is_supported(). Any incompatible/invalid kernels are filtered out during selection.

Gating/reusing strategies is a more challenging thing. I don't like the idea to always require refitting as it's an extra burden:

  • There are different variations of hardware like integrated/discrete or with different EU counts or different memory configuration. In theory all such platforms may have performance models tuned per-configuration but it's not feasible in practice so instead we group hardware by some characteristics. Grouping Xe2/Xe3 together can be a viable option in this paradigm if Xe2/Xe3 performance numbers correlate well.
  • During early enabling (or preSi enabling) we usually don't have hardware with "full" specs so any performance model is not reliable and subject to change significantly. Moreover with preSi enabling we may not have a way to get the needed amount of benchmark data

At the same time mixing performance models built for different platforms does not sound reasonable. Even though we may have good correlation between platforms, they are not calibrated to be comparable. So as a starting point we can follow these rules:

  1. If platform A doesn't have any kernels with A-specific performance model than we use the closest platform B to evaluate performance of A-compatible kernels.
    • This is a low-effort approach to support platforms during early enabling or platforms that are very similar to other platforms
  2. Once we have at least a single A-specific modeled kernel we look for A-based performance modeled kernels only

Transition between 1 and 2 may be problematic but this approach seems to be a compromise.

ir_check(fit_tag(tensor_kind_t::c, desc, prb, exact));
ir_check(is_compatible(tensor_kind_t::a, desc, prb, exact));
ir_check(is_compatible(tensor_kind_t::b, desc, prb, exact));
ir_check(is_compatible(tensor_kind_t::c, desc, prb, exact));
ir_check(prb.is_depthwise() == desc.is_dw)
<< "Mixing depthwise/non-depthwise descriptor and problem";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random spot: 2d messages have different alignment requirements on Xe2, but I don't see any check for it. Where is that check happening?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks. I'll open another PR to refactor block 2D requirements to remove them from the kernel descriptor. This way we can tailor a kernel descriptor to the target GPU platform on the fly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants