Move qlinear before concat to allow output fusion #3782

shivadbhavsar · 2025-01-28T23:41:20Z

OLD:
q -> conv -> dq -> add -> relu -> q .......... -> q -> conv -> dq -> add -> relu
			   |						     |
		           -> step ----------------------------------------------> concat -> q -> conv -> ...
							
NEW:

q -> conv -> dq -> add -> relu -> q .......... -> q -> conv -> dq -> add -> relu -> q
			   |	                                                    |
			   -> step -> q -----------------------------------------> concat -> conv -> ...

For now this brings down resnet50 int8 (nhwc enabled) on navi31 by ~0.05ms. Will do more work around the step -> q part in another PR.

codecov · 2025-01-28T23:52:40Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.30%. Comparing base (79f95da) to head (d4233ed).
Report is 2 commits behind head on develop.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #3782   +/-   ##
========================================
  Coverage    92.29%   92.30%           
========================================
  Files          519      519           
  Lines        22233    22261   +28     
========================================
+ Hits         20520    20548   +28     
  Misses        1713     1713

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/simplify_qdq.cpp

migraphx-bot · 2025-02-05T05:34:18Z

Test	Batch	Rate new d4233e	Rate old 5dc019	Diff	Compare
torchvision-resnet50	64	3,232.80	3,233.22	-0.01%	✅
torchvision-resnet50_fp16	64	6,864.01	6,861.00	0.04%	✅
torchvision-densenet121	32	2,431.06	2,431.85	-0.03%	✅
torchvision-densenet121_fp16	32	4,179.85	4,166.12	0.33%	✅
torchvision-inceptionv3	32	1,612.44	1,612.38	0.00%	✅
torchvision-inceptionv3_fp16	32	2,673.89	2,684.23	-0.39%	✅
cadene-inceptionv4	16	748.62	749.29	-0.09%	✅
cadene-resnext64x4	16	809.11	809.05	0.01%	✅
slim-mobilenet	64	6,659.29	6,659.22	0.00%	✅
slim-nasnetalarge	64	198.94	198.92	0.01%	✅
slim-resnet50v2	64	3,424.86	3,425.06	-0.01%	✅
bert-mrpc-onnx	8	1,139.96	1,138.85	0.10%	✅
bert-mrpc-tf	1	471.55	473.11	-0.33%	✅
pytorch-examples-wlang-gru	1	428.91	468.31	-8.41%	🔴
pytorch-examples-wlang-lstm	1	399.26	394.75	1.14%	✅
torchvision-resnet50_1	1	771.48	781.23	-1.25%	✅
cadene-dpn92_1	1	414.83	411.07	0.91%	✅
cadene-resnext101_1	1	389.13	389.62	-0.13%	✅
onnx-taau-downsample	1	371.01	372.07	-0.28%	✅
dlrm-criteoterabyte	1	30.53	30.53	-0.00%	✅
dlrm-criteoterabyte_fp16	1	49.03	49.10	-0.15%	✅
agentmodel	1	7,691.72	7,684.45	0.09%	✅
unet_fp16	2	57.61	57.79	-0.32%	✅
resnet50v1_fp16	1	979.96	991.06	-1.12%	✅
resnet50v1_int8	1	789.43	767.32	2.88%	✅
bert_base_cased_fp16	64	1,172.27	1,172.46	-0.02%	✅
bert_large_uncased_fp16	32	362.25	362.48	-0.06%	✅
bert_large_fp16	1	199.40	197.81	0.81%	✅
distilgpt2_fp16	16	2,213.95	2,214.12	-0.01%	✅
yolov5s	1	515.80	516.64	-0.16%	✅
tinyllama	1	43.41	43.46	-0.10%	✅
vicuna-fastchat	1	43.74	43.82	-0.19%	✅
whisper-tiny-encoder	1	411.10	410.74	0.09%	✅
whisper-tiny-decoder	1	405.55	402.68	0.71%	✅

This build is not recommended to merge 🔴

migraphx-bot · 2025-02-05T05:34:20Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

❌dlrm-criteoterabyte: ERROR - check error output

usage: accuracy_checker.py [-h] [--onnx ONNX] [--tf TF] [--provider PROVIDER]
[--batch BATCH] [--fill1] [--fill0] [--fp16]
[--argmax] [--verbose] [--tolerance TOLERANCE]
[--input-dim INPUT_DIM] [--target TARGET]
[--ort-run] [--ort-logging]
[--disable-offload-copy] [--disable-fast-math]
[--exhaustive_tune]
accuracy_checker.py: error: unrecognized arguments: input lS_i 307200 13 @lS_i 26 307200

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

❌vicuna-fastchat: ERROR - check error output

usage: accuracy_checker.py [-h] [--onnx ONNX] [--tf TF] [--provider PROVIDER]
[--batch BATCH] [--fill1] [--fill0] [--fp16]
[--argmax] [--verbose] [--tolerance TOLERANCE]
[--input-dim INPUT_DIM] [--target TARGET]
[--ort-run] [--ort-logging]
[--disable-offload-copy] [--disable-fast-math]
[--exhaustive_tune]
accuracy_checker.py: error: unrecognized arguments: input_ids attention_mask 1 256 @attention_mask 1 256

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

concat qlinear initial work

57848f1

shivadbhavsar self-assigned this Jan 28, 2025

shivadbhavsar added the Perf Improve label Jan 28, 2025

generalize for per channel and add tests

383002b

shivadbhavsar marked this pull request as ready for review January 30, 2025 01:21

shivadbhavsar requested a review from causten as a code owner January 30, 2025 01:21

shivadbhavsar requested review from pfultz2, CharlieL7 and lakhinderwalia January 30, 2025 01:21

shivadbhavsar changed the title ~~concat qlinear initial work~~ Move qlinear before concat to allow output fusion Jan 30, 2025

format

c30c09b

lakhinderwalia reviewed Jan 30, 2025

View reviewed changes

src/simplify_qdq.cpp Outdated Show resolved Hide resolved

fix cppcheck and add used_once check

56283dc

lakhinderwalia approved these changes Jan 30, 2025

View reviewed changes

CharlieL7 approved these changes Jan 30, 2025

View reviewed changes

Merge branch 'develop' into quant_fusions

5f5de3b

shivadbhavsar mentioned this pull request Jan 31, 2025

Fuse quantizelinear for skip layers using multioutput fusions #3791

Open

2 tasks

Merge branch 'develop' into quant_fusions

d4233ed

causten merged commit 3aee3a3 into develop Feb 5, 2025
43 of 45 checks passed

causten deleted the quant_fusions branch February 5, 2025 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move qlinear before concat to allow output fusion #3782

Move qlinear before concat to allow output fusion #3782

shivadbhavsar commented Jan 28, 2025 •

edited

Loading

codecov bot commented Jan 28, 2025 •

edited

Loading

migraphx-bot commented Feb 5, 2025

migraphx-bot commented Feb 5, 2025

Move qlinear before concat to allow output fusion #3782

Move qlinear before concat to allow output fusion #3782

Conversation

shivadbhavsar commented Jan 28, 2025 • edited Loading

codecov bot commented Jan 28, 2025 • edited Loading

Codecov Report

migraphx-bot commented Feb 5, 2025

migraphx-bot commented Feb 5, 2025

shivadbhavsar commented Jan 28, 2025 •

edited

Loading

codecov bot commented Jan 28, 2025 •

edited

Loading