Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End-to-End examples with MIGraphX #154

Open
attila-dusnoki-htec opened this issue Nov 15, 2023 · 5 comments
Open

End-to-End examples with MIGraphX #154

attila-dusnoki-htec opened this issue Nov 15, 2023 · 5 comments

Comments

@attila-dusnoki-htec
Copy link

Tracking issue for creating (complex) E2E (End-to-End) example apps using MIGraphX

@attila-dusnoki-htec
Copy link
Author

attila-dusnoki-htec commented Nov 15, 2023

Llama-2

As mentioned here

To test it with MIGraphX we can update these two apps:

Currently this version of llama-2 does not compile

@attila-dusnoki-htec
Copy link
Author

attila-dusnoki-htec commented Nov 15, 2023

Whisper

ORT version

The original repo uses pytorch, but there is a repo here with onnxruntime.
To use onnxruntime with migraphx (without modifying the code), it can be built with ort as a provider.

Currently only the encoder compiles. The decoder fails with the following:
/code/AMDMIGraphX/src/onnx/parse_matmul.cpp:78: parse: PARSE_MATMUL: dynamic shape broadcasting not supported

Hugging Face

Model used

It requires modifications to export with optimum. The attention_mask is not exposed by default.

A WIP example script to use the model.

@attila-dusnoki-htec attila-dusnoki-htec moved this from 🆕 New to 🔖 Ready in MIGraphX ONNX support Nov 15, 2023
@attila-dusnoki-htec
Copy link
Author

Stable Diffusion

Hugging Face model

python

PyTorch example in python
MIGraphX example python​

c++

GGML example in C++
MIGraphX example in C++

Docker

This dockerfile can be used for testing these examples.

@attila-dusnoki-htec attila-dusnoki-htec moved this from 🔖 Ready to 🏗 In progress in MIGraphX ONNX support Nov 23, 2023
@attila-dusnoki-htec attila-dusnoki-htec self-assigned this Nov 23, 2023
@attila-dusnoki-htec
Copy link
Author

attila-dusnoki-htec commented Dec 8, 2023

FP16

Currently not everything works with half precision.

Stable Diffusion

TextEncoder and VAE-Decoder works

Unet results in nans

The problem occurs here:

MIGRAPHX_TRACE_EVAL=2 /code/AMDMIGraphX/build/bin/migraphx-driver verify models/sd21-onnx/unet/model.onnx --input-dim @sample 1 4 64 64 @encoder_hidden_states 1 77 1024 @timestep 1 --fp16 --fill1 timestep

Run instruction: @2979 = slice[axes={0},starts={0},ends={5}](@2976) -> half_type, {5, 4096, 64}, {262144, 64, 1}, target_id=0
Time: 0.01197ms, 0.01274ms
Output has normal
Output: -193.875, 156.375, 140.5, 96.4375, 141, ..., 92.8125, 43.6562, 40.5938, 103.25, -76.4375
Min value: -399.5, Max value: 466.5, Mean: 5.80163, StdDev: 146.348
Run instruction: @2980 = load[offset=190006400,end=357778560](@1) -> half_type, {5, 4096, 4096}, {16777216, 4096, 1}, target_id=0
Time: 0.00414ms, 0.00445ms
Run instruction: @2981 = gpu::gemm[alpha=0.125,beta=0,compute_fp32=1,trans_batch=0,solution_idx=0](@2979,@2978,@2980) -> half_type, {5, 4096, 4096}, {16777216, 4096, 1}, target_id=0
Time: 0.112007ms, 0.399038ms
Output has inf, normal
Output: -55616, -54784, -53408, -52640, -53888, ..., -23792, -23520, -22976, -22784, -22592
Min value: -inf, Max value: 28624, Mean: -inf, StdDev: -nan
Run instruction: @2982 = load[offset=22234240,end=190006400](@1) -> half_type, {5, 4096, 4096}, {16777216, 4096, 1}, target_id=0
Time: 0.00726ms, 0.00815ms
Run instruction: @2983 = gpu::code_object[code_object=10496,symbol_name=softmax_kernel,global=5242880,local=256,](@2981,@2982) -> half_type, {5, 4096, 4096}, {16777216, 4096, 1}, target_id=0
Time: 0.046599ms, 0.32677ms
Output has normal, nan, zero
Output: 0, 0, 0, 0, 0, ..., 0, 0, 0, 0, 0
Min value: 0, Max value: 1, Mean: -nan, StdDev: -nan


@2968 = load[offset=9127040,end=11748480](@1) -> half_type, {1, 4096, 320}, {1310720, 320, 1}, target_id=0
@2969 = gpu::code_object[code_object=10768,symbol_name=add_reduce_mean_sub_pow_reduce_mean_add_sqrt_div_kernel,global=524288,local=128,](@2967,@2963,@2968) -> half_type, {1, 4096, 320}, {1310720, 320, 1}, target_id=0
@2970 = load[offset=22234240,end=30098560](@1) -> half_type, {1, 4096, 960}, {3932160, 960, 1}, target_id=0
@2971 = gpu::gemm[alpha=1,beta=1,compute_fp32=1,trans_batch=0,solution_idx=0](@2969,@2964,@2965,@2970) -> half_type, {1, 4096, 960}, {3932160, 960, 1}, target_id=0
@2972 = reshape_lazy[dims={1, 4096, 15, 64}](@2971) -> half_type, {1, 4096, 15, 64}, {3932160, 960, 64, 1}, target_id=0
@2973 = transpose[permutation={0, 2, 1, 3}](@2972) -> half_type, {1, 15, 4096, 64}, {3932160, 64, 960, 1}, target_id=0
@2974 = load[offset=14369920,end=22234240](@1) -> half_type, {1, 15, 4096, 64}, {3932160, 262144, 64, 1}, target_id=0
@2975 = gpu::code_object[code_object=8712,symbol_name=contiguous_kernel,global=983040,local=1024,](@2973,@2974) -> half_type, {1, 15, 4096, 64}, {3932160, 262144, 64, 1}, target_id=0
@2976 = reshape_lazy[dims={15, 4096, 64}](@2975) -> half_type, {15, 4096, 64}, {262144, 64, 1}, target_id=0
@2977 = slice[axes={0},starts={5},ends={10}](@2976) -> half_type, {5, 4096, 64}, {262144, 64, 1}, target_id=0
@2978 = transpose[permutation={0, 2, 1}](@2977) -> half_type, {5, 64, 4096}, {262144, 1, 64}, target_id=0
@2979 = slice[axes={0},starts={0},ends={5}](@2976) -> half_type, {5, 4096, 64}, {262144, 64, 1}, target_id=0
@2980 = load[offset=190006400,end=357778560](@1) -> half_type, {5, 4096, 4096}, {16777216, 4096, 1}, target_id=0
@2981 = gpu::gemm[alpha=0.125,beta=0,compute_fp32=1,trans_batch=0,solution_idx=0](@2979,@2978,@2980) -> half_type, {5, 4096, 4096}, {16777216, 4096, 1}, target_id=0
@2982 = load[offset=22234240,end=190006400](@1) -> half_type, {5, 4096, 4096}, {16777216, 4096, 1}, target_id=0
@2983 = gpu::code_object[code_object=10496,symbol_name=softmax_kernel,global=5242880,local=256,](@2981,@2982) -> half_type, {5, 4096, 4096}, {16777216, 4096, 1}, target_id=0
@2984 = slice[axes={0},starts={10},ends={15}](@2976) -> half_type, {5, 4096, 64}, {262144, 64, 1}, target_id=0
@2985 = load[offset=9127040,end=11748480](@1) -> half_type, {5, 4096, 64}, {262144, 64, 1}, target_id=0

Ref

The reference implementation shows the same issue:

fp32

Run instruction: @6525 = ref::dot(@6521,@6524) -> float_type, {5, 4096, 4096}, {16777216, 4096, 1}, target_id=0
Time: 1561.65ms, 1561.65ms
Output has normal
Output: -453921, -447240, -438036, -430549, -439314, ..., -192728, -190610, -185909, -184388, -183079
Min value: -742234, Max value: 230521, Mean: -368671, StdDev: 153924

fp16

Run instruction: @4528 = ref::dot(@4517,@4527) -> half_type, {5, 4096, 4096}, {16777216, 4096, 1}, target_id=0
Time: 1508.01ms, 1508.01ms
Output has normal, inf
Output: -inf, -inf, -inf, -inf, -inf, ..., -inf, -inf, -inf, -inf, -inf
Min value: -inf, Max value: inf, Mean: -nan, StdDev: -nan

Llama-2

Looking at the results, the fp32 has inf and nan in the result. But it comes from the masking, and it is expected.

Run instruction: @640 = ref::multibroadcast[out_lens={1, 32, 256, 256},out_dyn_dims={}](@562) -> float_type, {1, 32, 256, 256}, {65536, 0, 256, 1}, target_id=0
Time: 0.00487ms, 0.00491ms
Output has inf, normal, zero
Output: 0, -3.40282e+38, -3.40282e+38, -3.40282e+38, -3.40282e+38, ..., -3.40282e+38, -3.40282e+38, -3.40282e+38, -3.40282e+38, -3.40282e+38
Min value: -inf, Max value: 0, Mean: -inf, StdDev: -nan
Run instruction: @641 = ref::contiguous(@640) -> float_type, {1, 32, 256, 256}, {2097152, 65536, 256, 1}, target_id=0
Time: 47.8585ms, 47.8587ms
Output has inf, normal, zero
Output: 0, -3.40282e+38, -3.40282e+38, -3.40282e+38, -3.40282e+38, ..., -3.40282e+38, -3.40282e+38, -3.40282e+38, -3.40282e+38, -3.40282e+38
Min value: -inf, Max value: 0, Mean: -inf, StdDev: -nan
Run instruction: @642 = ref::add(@639,@641) -> float_type, {1, 32, 256, 256}, {2097152, 65536, 256, 1}, target_id=0
Time: 8.60302ms, 8.60312ms
Output has inf, normal
Output: 2.53309, -3.40282e+38, -3.40282e+38, -3.40282e+38, -3.40282e+38, ..., -3.40282e+38, -3.40282e+38, -3.40282e+38, -3.40282e+38, -3.40282e+38
Min value: -inf, Max value: 9.41974, Mean: -inf, StdDev: -nan
Run instruction: @643 = ref::softmax[axis=3](@642) -> float_type, {1, 32, 256, 256}, {2097152, 65536, 256, 1}, target_id=0
Time: 7.95904ms, 7.95916ms
Output has zero, normal
Output: 1, 0, 0, 0, 0, ..., 0, 0, 0, 0, 0
Min value: 0, Max value: 1, Mean: 0.00390625, StdDev: 0.0240073

The fp16 version has it as well:

Run instruction: @355 = ref::multibroadcast[out_lens={1, 32, 256, 256},out_dyn_dims={}](@287) -> half_type, {1, 32, 256, 256}, {65536, 0, 256, 1}, target_id=0
Time: 0.0025ms, 0.00254ms
Output has inf, normal, zero
Output: 0, -65504, -65504, -65504, -65504, ..., -65504, -65504, -65504, -65504, -65504
Min value: -inf, Max value: 0, Mean: -inf, StdDev: -nan
Run instruction: @356 = ref::contiguous(@355) -> half_type, {1, 32, 256, 256}, {2097152, 65536, 256, 1}, target_id=0
Time: 43.9691ms, 43.9692ms
Output has inf, normal, zero
Output: 0, -65504, -65504, -65504, -65504, ..., -65504, -65504, -65504, -65504, -65504
Min value: -inf, Max value: 0, Mean: -inf, StdDev: -nan
Run instruction: @357 = ref::add(@354,@356) -> half_type, {1, 32, 256, 256}, {2097152, 65536, 256, 1}, target_id=0
Time: 2.98311ms, 2.98321ms
Output has zero, inf, normal
Output: 2.53516, -65472, -65472, -65472, -65472, ..., -65504, -65504, -65504, -65504, -65504
Min value: -inf, Max value: 9.42188, Mean: -inf, StdDev: -nan
Run instruction: @358 = ref::softmax[axis=3](@357) -> half_type, {1, 32, 256, 256}, {2097152, 65536, 256, 1}, target_id=0
Time: 5.48778ms, 5.48788ms
Output has zero, normal
Output: 1, 0, 0, 0, 0, ..., 0, 0, 0, 0, 0
Min value: 0, Max value: 1, Mean: 0.00390506, StdDev: 0.0239995

But there are pows which gets a bit out of hand for the lower precision.

The first 6 of 65 pows with fp32:

Run instruction: @566 = ref::pow(@563,@565) -> float_type, {1, 256, 4096}, {1048576, 4096, 1}, target_id=0
Time: 4.5869ms, 4.58699ms
Output has zero, normal
Output: 3.38076e-06, 1.45519e-05, 9.24105e-07, 8.58307e-06, 1.59824e-05, ..., 2.50679e-11, 5.13012e-12, 6.96332e-13, 4.14389e-11, 7.99361e-13
Min value: 0, Max value: 0.0178995, Mean: 4.2016e-06, StdDev: 6.058e-05

Run instruction: @657 = ref::pow(@654,@656) -> float_type, {1, 256, 4096}, {1048576, 4096, 1}, target_id=0
Time: 3.02639ms, 3.0265ms
Output has normal
Output: 9.25549e-05, 0.00138359, 5.57778e-05, 0.000400907, 0.00149003, ..., 5.14963e-05, 0.000458215, 1.88229e-06, 1.4133e-05, 0.000107119
Min value: 1.5283e-12, Max value: 0.564558, Mean: 0.000190222, StdDev: 0.0029691

Run instruction: @689 = ref::pow(@686,@688) -> float_type, {1, 256, 4096}, {1048576, 4096, 1}, target_id=0
Time: 3.2122ms, 3.21228ms
Output has normal
Output: 0.000135018, 0.000127227, 0.00313293, 0.0010952, 0.0218154, ..., 0.00161709, 0.20428, 6.98841e-05, 0.00672539, 0.00149917
Min value: 3.36059e-12, Max value: 22.6929, Mean: 0.0111376, StdDev: 0.255466

Run instruction: @780 = ref::pow(@777,@779) -> float_type, {1, 256, 4096}, {1048576, 4096, 1}, target_id=0
Time: 2.8067ms, 2.80677ms
Output has normal
Output: 0.000343903, 0.00054023, 0.00646422, 0.00035014, 0.0129346, ..., 6.03671e-05, 0.142149, 0.00201874, 0.00681912, 0.00107211
Min value: 2.72005e-15, Max value: 13.7998, Mean: 0.00849891, StdDev: 0.200751

Run instruction: @812 = ref::pow(@809,@811) -> float_type, {1, 256, 4096}, {1048576, 4096, 1}, target_id=0
Time: 3.50274ms, 3.50282ms
Output has normal
Output: 0.00488131, 0.261523, 0.00412025, 0.00380722, 0.00340024, ..., 1.33656e-05, 0.0975391, 0.00262303, 0.0078206, 0.00041745
Min value: 1.38778e-15, Max value: 569485, Mean: 2.79251, StdDev: 629.777

Run instruction: @903 = ref::pow(@900,@902) -> float_type, {1, 256, 4096}, {1048576, 4096, 1}, target_id=0
Time: 2.96483ms, 2.96493ms
Output has normal
Output: 0.00556073, 0.258198, 0.00398735, 0.00294377, 0.00266523, ..., 0.0058447, 0.124247, 1.0684e-05, 0.00872876, 0.000976011
Min value: 4.996e-16, Max value: 569493, Mean: 2.80136, StdDev: 629.854

And the same ones with fp16:

Run instruction: @290 = ref::pow(@271,@289) -> half_type, {1, 256, 4096}, {1048576, 4096, 1}, target_id=0
Time: 8.88927ms, 8.88934ms
Output has zero, normal
Output: 3.33786e-06, 1.45435e-05, 8.9407e-07, 8.58307e-06, 1.5974e-05, ..., 0, 0, 0, 0, 0
Min value: 0, Max value: 0.0178986, Mean: 4.20052e-06, StdDev: 6.05707e-05

Run instruction: @367 = ref::pow(@366,@289) -> half_type, {1, 256, 4096}, {1048576, 4096, 1}, target_id=0
Time: 1.82259ms, 1.82267ms
Output has zero, normal
Output: 9.22084e-05, 0.00138378, 5.56707e-05, 0.000400782, 0.00148964, ..., 5.126e-05, 0.00045681, 1.84774e-06, 1.40667e-05, 0.000106812
Min value: 0, Max value: 0.5625, Mean: 0.000189523, StdDev: 0.00295725

Run instruction: @392 = ref::pow(@391,@289) -> half_type, {1, 256, 4096}, {1048576, 4096, 1}, target_id=0
Time: 1.93096ms, 1.93115ms
Output has zero, normal
Output: 0.000134945, 0.000124693, 0.0031147, 0.00109196, 0.0217438, ..., 0.00160027, 0.203247, 6.96182e-05, 0.00668716, 0.00148964
Min value: 0, Max value: 22.5938, Mean: 0.0110913, StdDev: 0.254561

Run instruction: @453 = ref::pow(@452,@289) -> half_type, {1, 256, 4096}, {1048576, 4096, 1}, target_id=0
Time: 1.87675ms, 1.87683ms
Output has zero, normal
Output: 0.000343561, 0.000535011, 0.00643921, 0.000348806, 0.0128708, ..., 6.19888e-05, 0.141357, 0.00200653, 0.00676727, 0.00106621
Min value: 0, Max value: 13.75, Mean: 0.0084596, StdDev: 0.199949

Run instruction: @478 = ref::pow(@477,@289) -> half_type, {1, 256, 4096}, {1048576, 4096, 1}, target_id=0
Time: 1.87855ms, 1.87876ms
Output has zero, inf, normal
Output: 0.00484085, 0.259766, 0.00411987, 0.00378418, 0.00338936, ..., 1.39475e-05, 0.0970459, 0.00259972, 0.00775528, 0.000410557
Min value: 0, Max value: inf, Mean: inf, StdDev: -nan

Run instruction: @539 = ref::pow(@538,@289) -> half_type, {1, 256, 4096}, {1048576, 4096, 1}, target_id=0
Time: 1.90855ms, 1.90885ms
Output has zero, inf, normal
Output: 0.00484085, 0.259766, 0.00411987, 0.00378418, 0.00338936, ..., 0.00645065, 0.125488, 1.10269e-05, 0.00844574, 0.00102425
Min value: 0, Max value: inf, Mean: inf, StdDev: -nan

@attila-dusnoki-htec
Copy link
Author

These 2 issues are reported here: ROCm#2555 and ROCm#2556

The solution will be postponed until the float propagation is resolved.

@attila-dusnoki-htec attila-dusnoki-htec moved this from 🏗 In progress to 🚧 Blocked in MIGraphX ONNX support Feb 1, 2024
@attila-dusnoki-htec attila-dusnoki-htec removed their assignment Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🚧 Blocked
Development

No branches or pull requests

1 participant