Resize onnx operator: Optimization for Compute and Space performance of its linear option. #3773

lakhinderwalia · 2025-01-21T23:00:05Z

Optimize the space overhead required for Linear Resize operation: it is now 4x smaller for its 2D images. There were very large data-structures, getting to be over 16 times the total input_pixels for a 4D tensor. And now it becomes 4x smaller in size, followed with fewer reduction steps. (Similar optimization for its compute overhead.)

A comparison of parsing test/onnx/upsample_linear_test.onnx:
(Before)
Calculated resize-tensor size:
@4 = @literal{ ... } -> int32_type, {16, 1, 4, 4}, {16, 16, 4, 1}

Reading: ../test/onnx/upsample_linear_test.onnx
module: "main"
@0 = @literal{ ... } -> float_type, {1, 1, 4, 4}, {16, 16, 4, 1}
@1 = @literal{ ... } -> float_type, {2, 1, 4, 4}, {16, 16, 4, 1}
@2 = @literal{ ... } -> float_type, {4, 1, 4, 4}, {16, 16, 4, 1}
@3 = @literal{ ... } -> float_type, {8, 1, 4, 4}, {16, 16, 4, 1}
@4 = @literal{ ... } -> int32_type, {16, 1, 4, 4}, {16, 16, 4, 1}
X = @param:X -> float_type, {1, 1, 2, 2}, {4, 4, 2, 1}
@6 = @literal{1, 1, 2, 2} -> float_type, {4}, {1}
@7 = undefined -> float_type, {}, {}
@8 = reshape[dims={4}](X) -> float_type, {4}, {1}
@9 = gather[axis=0](@8,@4) -> float_type, {16, 1, 4, 4}, {16, 16, 4, 1}
@10 = slice[axes={0},starts={0},ends={8}](@9) -> float_type, {8, 1, 4, 4}, {16, 16, 4, 1}
@11 = slice[axes={0},starts={8},ends={16}](@9) -> float_type, {8, 1, 4, 4}, {16, 16, 4, 1}
@12 = sub(@11,@10) -> float_type, {8, 1, 4, 4}, {16, 16, 4, 1}
@13 = mul(@12,@3) -> float_type, {8, 1, 4, 4}, {16, 16, 4, 1}
@14 = add(@13,@10) -> float_type, {8, 1, 4, 4}, {16, 16, 4, 1}
@15 = slice[axes={0},starts={0},ends={4}](@14) -> float_type, {4, 1, 4, 4}, {16, 16, 4, 1}
@16 = slice[axes={0},starts={4},ends={8}](@14) -> float_type, {4, 1, 4, 4}, {16, 16, 4, 1}
@17 = sub(@16,@15) -> float_type, {4, 1, 4, 4}, {16, 16, 4, 1}
@18 = mul(@17,@2) -> float_type, {4, 1, 4, 4}, {16, 16, 4, 1}
@19 = add(@18,@15) -> float_type, {4, 1, 4, 4}, {16, 16, 4, 1}
@20 = slice[axes={0},starts={0},ends={2}](@19) -> float_type, {2, 1, 4, 4}, {16, 16, 4, 1}
@21 = slice[axes={0},starts={2},ends={4}](@19) -> float_type, {2, 1, 4, 4}, {16, 16, 4, 1}
@22 = sub(@21,@20) -> float_type, {2, 1, 4, 4}, {16, 16, 4, 1}
@23 = mul(@22,@1) -> float_type, {2, 1, 4, 4}, {16, 16, 4, 1}
@24 = add(@23,@20) -> float_type, {2, 1, 4, 4}, {16, 16, 4, 1}
@25 = slice[axes={0},starts={0},ends={1}](@24) -> float_type, {1, 1, 4, 4}, {16, 16, 4, 1}
@26 = slice[axes={0},starts={1},ends={2}](@24) -> float_type, {1, 1, 4, 4}, {16, 16, 4, 1}
@27 = sub(@26,@25) -> float_type, {1, 1, 4, 4}, {16, 16, 4, 1}
@28 = mul(@27,@0) -> float_type, {1, 1, 4, 4}, {16, 16, 4, 1}
@29 = add(@28,@25) -> float_type, {1, 1, 4, 4}, {16, 16, 4, 1}
@30 = @return(@29)

With this PR:
Calculated resize-tensor size:
@2 = @literal{ ... } -> int32_type, {4, 1, 4, 4}, {16, 16, 4, 1}

Reading: ../test/onnx/upsample_linear_test.onnx
module: "main"
@0 = @literal{ ... } -> float_type, {1, 1, 4, 4}, {16, 16, 4, 1}
@1 = @literal{ ... } -> float_type, {2, 1, 4, 4}, {16, 16, 4, 1}
@2 = @literal{ ... } -> int32_type, {4, 1, 4, 4}, {16, 16, 4, 1}
X = @param:X -> float_type, {1, 1, 2, 2}, {4, 4, 2, 1}
@4 = @literal{1, 1, 2, 2} -> float_type, {4}, {1}
@5 = undefined -> float_type, {}, {}
@6 = reshape[dims={4}](X) -> float_type, {4}, {1}
@7 = gather[axis=0](@6,@2) -> float_type, {4, 1, 4, 4}, {16, 16, 4, 1}
@8 = slice[axes={0},starts={0},ends={2}](@7) -> float_type, {2, 1, 4, 4}, {16, 16, 4, 1}
@9 = slice[axes={0},starts={2},ends={4}](@7) -> float_type, {2, 1, 4, 4}, {16, 16, 4, 1}
@10 = sub(@9,@8) -> float_type, {2, 1, 4, 4}, {16, 16, 4, 1}
@11 = mul(@10,@1) -> float_type, {2, 1, 4, 4}, {16, 16, 4, 1}
@12 = add(@11,@8) -> float_type, {2, 1, 4, 4}, {16, 16, 4, 1}
@13 = slice[axes={0},starts={0},ends={1}](@12) -> float_type, {1, 1, 4, 4}, {16, 16, 4, 1}
@14 = slice[axes={0},starts={1},ends={2}](@12) -> float_type, {1, 1, 4, 4}, {16, 16, 4, 1}
@15 = sub(@14,@13) -> float_type, {1, 1, 4, 4}, {16, 16, 4, 1}
@16 = mul(@15,@0) -> float_type, {1, 1, 4, 4}, {16, 16, 4, 1}
@17 = add(@16,@13) -> float_type, {1, 1, 4, 4}, {16, 16, 4, 1}
@18 = @return(@17)

…ption

coxuamd · 2025-01-22T05:11:47Z

Is this sort of dup of #3731?

…r) tests

lakhinderwalia · 2025-01-22T06:02:10Z

Is this sort of dup of #3731?

No. Orthogonal and a more fundamental change to Resize parsing.
This PR doesn't change the recursive nature of calc_neighbor_points().

codecov · 2025-01-22T06:10:38Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.29%. Comparing base (1c10b4d) to head (1a11b9d).
Report is 3 commits behind head on develop.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #3773   +/-   ##
========================================
  Coverage    92.28%   92.29%           
========================================
  Files          519      519           
  Lines        22216    22233   +17     
========================================
+ Hits         20503    20520   +17     
  Misses        1713     1713

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coxuamd · 2025-01-22T07:29:19Z

Is this sort of dup of #3731?

No. Orthogonal and a more fundamental change to Resize parsing. This PR doesn't change the recursive nature of calc_neighbor_points().

Thanks for the explanation.

…ance

lakhinderwalia · 2025-01-22T18:19:29Z

(Background: going beyond the issue, #2129, the Resize Op could use more optimization in its basic calculations, hence this PR).

CharlieL7 · 2025-01-23T18:36:32Z

I think these code changes make a merge conflict with the code in #3731 though?

src/onnx/parse_resize.cpp

kahmed10

need to address tidy warning but otherwise LGTM

migraphx-bot · 2025-01-26T21:05:47Z

Test	Batch	Rate new 1a11b9	Rate old 250304	Diff	Compare
torchvision-resnet50	64	3,233.70	3,232.80	0.03%	✅
torchvision-resnet50_fp16	64	6,874.50	6,877.79	-0.05%	✅
torchvision-densenet121	32	2,435.91	2,438.17	-0.09%	✅
torchvision-densenet121_fp16	32	4,198.58	4,199.78	-0.03%	✅
torchvision-inceptionv3	32	1,613.40	1,613.72	-0.02%	✅
torchvision-inceptionv3_fp16	32	2,687.56	2,691.74	-0.16%	✅
cadene-inceptionv4	16	749.96	750.70	-0.10%	✅
cadene-resnext64x4	16	808.85	809.07	-0.03%	✅
slim-mobilenet	64	6,661.80	6,657.07	0.07%	✅
slim-nasnetalarge	64	198.96	199.03	-0.04%	✅
slim-resnet50v2	64	3,426.55	3,429.36	-0.08%	✅
bert-mrpc-onnx	8	1,144.29	1,145.21	-0.08%	✅
bert-mrpc-tf	1	480.84	487.18	-1.30%	✅
pytorch-examples-wlang-gru	1	471.66	476.69	-1.06%	✅
pytorch-examples-wlang-lstm	1	478.45	437.02	9.48%	🔆
torchvision-resnet50_1	1	809.82	810.57	-0.09%	✅
cadene-dpn92_1	1	430.23	431.01	-0.18%	✅
cadene-resnext101_1	1	389.78	390.11	-0.09%	✅
onnx-taau-downsample	1	372.04	373.51	-0.39%	✅
dlrm-criteoterabyte	1	31.79	31.80	-0.02%	✅
dlrm-criteoterabyte_fp16	1	51.03	51.04	-0.02%	✅
agentmodel	1	8,775.62	8,550.96	2.63%	✅
unet_fp16	2	58.00	57.94	0.10%	✅
resnet50v1_fp16	1	1,035.89	1,017.55	1.80%	✅
resnet50v1_int8	1	775.53	799.64	-3.02%	🔴
bert_base_cased_fp16	64	1,172.03	1,172.67	-0.06%	✅
bert_large_uncased_fp16	32	362.47	362.53	-0.02%	✅
bert_large_fp16	1	200.79	201.02	-0.11%	✅
distilgpt2_fp16	16	2,215.95	2,217.34	-0.06%	✅
yolov5s	1	532.39	523.23	1.75%	✅
tinyllama	1	43.60	43.60	0.02%	✅
vicuna-fastchat	1	175.47	178.42	-1.65%	✅
whisper-tiny-encoder	1	411.83	411.71	0.03%	✅
whisper-tiny-decoder	1	411.48	411.62	-0.03%	✅

This build is not recommended to merge 🔴

migraphx-bot · 2025-01-26T21:05:49Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

pfultz2 · 2025-01-27T18:27:53Z

src/onnx/parse_resize.cpp

-    {
-        MIGRAPHX_THROW("PARSE_RESIZE: Shape dimension " + std::to_string(n_bits) + " exceeds " +
-                       std::to_string(std::numeric_limits<std::size_t>::digits));
-    }


The error checking shouldn't be removed.

This is the error check that belongs its caller api, if at all. So this is an exception one would hit if the lens dimension is 64 deep.

pfultz2 · 2025-01-27T18:29:38Z

src/onnx/parse_resize.cpp

+                auto lo           = vvv_ind[entry->second][0][e_idx];
+                auto hi           = vvv_ind[entry->second][1][e_idx];
+                for(size_t i = 0; i < permutations; i++)
+                    perm_blk[i][l_idx] = ((i & hi_cmp_bit) != 0) ? hi : lo;


What is this supposed to do? There is no explanation here. Using a bitset like the previous version would be better.

Resize onnx op: cleanup for Compute and Space performance of linear o…

d66d57d

…ption

lakhinderwalia requested a review from causten as a code owner January 21, 2025 23:00

lakhinderwalia self-assigned this Jan 21, 2025

Update the expected IR for onnx parsing of upsample and resize (linea…

7b21008

…r) tests

lakhinderwalia requested review from CharlieL7, TedThemistokleous and kahmed10 January 22, 2025 17:39

Add a test model for a large linear Resize operation to check perform…

59f6eab

…ance

TedThemistokleous added enhancement New feature or request Perf Improve labels Jan 22, 2025

asan fix

27cd210

lakhinderwalia requested a review from mvermeulen January 23, 2025 16:36

kahmed10 reviewed Jan 23, 2025

View reviewed changes