xe: conv_v2: add Xe2/Xe3 support #2475

echeresh · 2025-01-22T03:17:29Z

Jira: https://jira.devtools.intel.com/browse/MFDNN-13005

Reusing XeHPC kernels for Xe2/Xe3.

echeresh · 2025-01-22T03:20:05Z

make test
disable test_device_cpu
disable build_cpu_runtime_omp
disable build_cpu_runtime_sycl
disable build_cpu_runtime_tbb
disable benchdnn_all
enable benchdnn_nightly
enable benchdnn_conv
enable benchdnn_deconv
enable benchdnn_reorder
enable benchdnn_sum
enable arch_gpu_xe-hpc
enable arch_gpu_xe-hpg-atsm
enable arch_gpu_xe-hpg-dg2
enable arch_gpu_xe-lp
enable arch_gpu_xe-lpg
enable arch_gpu_xe-lpg+
enable arch_gpu_xe2-hpg-bmg
enable arch_gpu_xe2-lpg
enable arch_gpu_xe3-lpg

rjoursler · 2025-01-22T15:49:36Z

src/gpu/intel/jit/v2/conv/kernel_desc.cpp

+        switch (hw_desc.hw) {
+            case ngen::HW::XeHPC:
+                return utils::one_of(
+                        hw.to_ngen(), ngen::HW::Xe2, ngen::HW::Xe3);


Could we make compatibility checks against feature requirements (i,e, dpas, 2d send, etc) rather than hardware generations? It generally seems like performant strategies behave well on multiple architectures. By forcing strategies to certain architecture we induces extra tuning work as strategies found on platform X are not available on platform Y. What if instead, we can gate strategies by whether there is a performance model associated with the platform, so that a simple model refitting step can be used to take advantage of work on different platforms?

Could we make compatibility checks against feature requirements (i,e, dpas, 2d send, etc) rather than hardware generations?

I think this step is essentially part of kernel_desc_t::is_supported(). Any incompatible/invalid kernels are filtered out during selection.

Gating/reusing strategies is a more challenging thing. I don't like the idea to always require refitting as it's an extra burden:

There are different variations of hardware like integrated/discrete or with different EU counts or different memory configuration. In theory all such platforms may have performance models tuned per-configuration but it's not feasible in practice so instead we group hardware by some characteristics. Grouping Xe2/Xe3 together can be a viable option in this paradigm if Xe2/Xe3 performance numbers correlate well.

During early enabling (or preSi enabling) we usually don't have hardware with "full" specs so any performance model is not reliable and subject to change significantly. Moreover with preSi enabling we may not have a way to get the needed amount of benchmark data

At the same time mixing performance models built for different platforms does not sound reasonable. Even though we may have good correlation between platforms, they are not calibrated to be comparable. So as a starting point we can follow these rules:

If platform A doesn't have any kernels with A-specific performance model than we use the closest platform B to evaluate performance of A-compatible kernels.

This is a low-effort approach to support platforms during early enabling or platforms that are very similar to other platforms

Once we have at least a single A-specific modeled kernel we look for A-based performance modeled kernels only

Transition between 1 and 2 may be problematic but this approach seems to be a compromise.

rjoursler · 2025-01-22T15:50:41Z

src/gpu/intel/jit/v2/conv/kernel_desc.cpp

-    ir_check(fit_tag(tensor_kind_t::c, desc, prb, exact));
+    ir_check(is_compatible(tensor_kind_t::a, desc, prb, exact));
+    ir_check(is_compatible(tensor_kind_t::b, desc, prb, exact));
+    ir_check(is_compatible(tensor_kind_t::c, desc, prb, exact));
    ir_check(prb.is_depthwise() == desc.is_dw)
            << "Mixing depthwise/non-depthwise descriptor and problem";


Random spot: 2d messages have different alignment requirements on Xe2, but I don't see any check for it. Where is that check happening?

Good point, thanks. I'll open another PR to refactor block 2D requirements to remove them from the kernel descriptor. This way we can tailor a kernel descriptor to the target GPU platform on the fly.

echeresh added 2 commits January 21, 2025 19:16

xe: conv: add missing algorithm kind

b3c68cd

xe: conv_v2: add Xe2/Xe3 support

a034685

echeresh added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Jan 22, 2025

echeresh requested a review from a team as a code owner January 22, 2025 03:17

rjoursler approved these changes Jan 22, 2025

View reviewed changes

kealan-barbieri approved these changes Jan 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xe: conv_v2: add Xe2/Xe3 support #2475

xe: conv_v2: add Xe2/Xe3 support #2475

echeresh commented Jan 22, 2025

echeresh commented Jan 22, 2025

rjoursler Jan 22, 2025 •

edited

Loading

echeresh Jan 24, 2025 •

edited

Loading

rjoursler Jan 22, 2025

echeresh Jan 24, 2025

xe: conv_v2: add Xe2/Xe3 support #2475

Are you sure you want to change the base?

xe: conv_v2: add Xe2/Xe3 support #2475

Conversation

echeresh commented Jan 22, 2025

echeresh commented Jan 22, 2025

rjoursler Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

echeresh Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

rjoursler Jan 22, 2025

Choose a reason for hiding this comment

echeresh Jan 24, 2025

Choose a reason for hiding this comment

rjoursler Jan 22, 2025 •

edited

Loading

echeresh Jan 24, 2025 •

edited

Loading