Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model testing with Datasets #168

Open
attila-dusnoki-htec opened this issue Feb 16, 2024 · 8 comments
Open

Model testing with Datasets #168

attila-dusnoki-htec opened this issue Feb 16, 2024 · 8 comments
Assignees

Comments

@attila-dusnoki-htec
Copy link

To properly test model accuracy, it is not enough to use random data, since it might not cover the proper range of the possible data.

We should collect candidates for datasets, and assign models to them.

The idea is to use public datasets. HuggingFace provides datasets.
It provies also helpers to load these in python.
Downloading is not enough, since each model has a pre- and post-processing step. It can vary for each model.

The whole process should be automatic and deterministic.

@attila-dusnoki-htec attila-dusnoki-htec converted this from a draft issue Feb 16, 2024
@attila-dusnoki-htec attila-dusnoki-htec moved this from 🆕 New to 🔖 Ready in MIGraphX ONNX support Feb 16, 2024
@attila-dusnoki-htec
Copy link
Author

We should make this work with test_runner.py

A good start would be to enable 2-3 datasets with 1-2 models.

@attila-dusnoki-htec attila-dusnoki-htec moved this from 🔖 Ready to 🏗 In progress in MIGraphX ONNX support Feb 22, 2024
@attila-dusnoki-htec attila-dusnoki-htec self-assigned this Feb 22, 2024
@attila-dusnoki-htec
Copy link
Author

The current work is here

Covering all 3 major data sources: ImageNet2012 (Images), SQuADv1.1 (Text), LibriSpeechASR (Sound)
With (single-)models that use these with the proper preprocessing steps.
The result is in the format that test_runner.py can run.

Next step: Enable multi-model scenarios e.g. Whisper (encoder-decoder), Stable Diffusion (text-encoder, unet, vae-decoder), etc.

@attila-dusnoki-htec attila-dusnoki-htec moved this from 🏗 In progress to 🔖 Ready in MIGraphX ONNX support Mar 7, 2024
@attila-dusnoki-htec attila-dusnoki-htec moved this from 🔖 Ready to 🏗 In progress in MIGraphX ONNX support Apr 3, 2024
@attila-dusnoki-htec attila-dusnoki-htec moved this from 🏗 In progress to 🔖 Ready in MIGraphX ONNX support Apr 5, 2024
@attila-dusnoki-htec
Copy link
Author

attila-dusnoki-htec commented Apr 15, 2024

Current state

Code link to reproduce

Note: The default atol/rtol values were used. For fp16, looking at the logs, the numbers look correct, but less precise then the tolerance. It might be better to report the difference as well to make te comparision easier.

We can download and generate actual test data for the following models with its dataset:

Imagenet dataset (image)

resnet50_v1.5_fp16.log

Input: pixel_values, shape: (1, 3, 224, 224)

Test "resnet50_v1.5" has 7 cases:
	 Passed: 0
	 Failed: 7

resnet50_v1.5_fp32.log

Input: pixel_values, shape: (1, 3, 224, 224)

Test "resnet50_v1.5" has 7 cases:
	 Passed: 7
	 Failed: 0

resnet50_v1_fp16.log

Input: input_tensor:0, shape: (1, 3, 224, 224)

Test "resnet50_v1" has 7 cases:
	 Passed: 1
	 Failed: 6

resnet50_v1_fp32.log

Input: input_tensor:0, shape: (1, 3, 224, 224)

Test "resnet50_v1" has 7 cases:
	 Passed: 7
	 Failed: 0

timm-mobilenetv3-large_fp16.log

Input: pixel_values, shape: (1, 3, 224, 224)

Test "timm-mobilenetv3-large" has 7 cases:
	 Passed: 0
	 Failed: 7

timm-mobilenetv3-large_fp32.log

Input: pixel_values, shape: (1, 3, 224, 224)

Test "timm-mobilenetv3-large" has 7 cases:
	 Passed: 7
	 Failed: 0

vit-base-patch16-224_fp16.log

Input: pixel_values, shape: (1, 3, 224, 224)

Test "vit-base-patch16-224" has 7 cases:
	 Passed: 0
	 Failed: 7

vit-base-patch16-224_fp32.log

Input: pixel_values, shape: (1, 3, 224, 224)

Test "vit-base-patch16-224" has 7 cases:
	 Passed: 7
	 Failed: 0

SQuAD dataset (text)

distilbert-base-cased-distilled-squad_fp16.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)

Test "distilbert-base-cased-distilled-squad" has 7 cases:
	 Passed: 0
	 Failed: 7

distilbert-base-cased-distilled-squad_fp32.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)

Test "distilbert-base-cased-distilled-squad" has 7 cases:
	 Passed: 7
	 Failed: 0

gpt-j_fp16.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: position_ids, shape: (1, 384)

Test "gpt-j" has 42 cases:
	 Passed: 0
	 Failed: 42

gpt-j_fp32.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: position_ids, shape: (1, 384)

Test "gpt-j" has 42 cases:
	 Passed: 42
	 Failed: 0

roberta-base-squad2_fp16.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)

Test "roberta-base-squad2" has 7 cases:
	 Passed: 0
	 Failed: 7

roberta-base-squad2_fp32.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)

Test "roberta-base-squad2" has 7 cases:
	 Passed: 7
	 Failed: 0

LibriSpeech dataset (audio)

wav2vec2-base-960h_fp16.log

Input: input_values, shape: (1, 105440)

Test "wav2vec2-base-960h" has 7 cases:
	 Passed: 0
	 Failed: 7

wav2vec2-base-960h_fp32.log

Input: input_values, shape: (1, 105440)

Test "wav2vec2-base-960h" has 7 cases:
	 Passed: 1
	 Failed: 6

whisper-small-en_fp16.log

Input: input_features, shape: (1, 80, 3000)
Input: decoder_input_ids, shape: (1, 448)

Test "whisper-small-en" has 21 cases:
	 Passed: 0
	 Failed: 21

whisper-small-en_fp32.log

Input: input_features, shape: (1, 80, 3000)
Input: decoder_input_ids, shape: (1, 448)

Test "whisper-small-en" has 21 cases:
	 Passed: 21
	 Failed: 0

@attila-dusnoki-htec attila-dusnoki-htec moved this from 🔖 Ready to 🏗 In progress in MIGraphX ONNX support Apr 15, 2024
@attila-dusnoki-htec
Copy link
Author

Current State pt2

Imagenet dataset (image)

clip_vit_fp16.log

Input: input_ids, shape: (10, 77)
Input: attention_mask, shape: (10, 77)
Input: pixel_values, shape: (1, 3, 224, 224)

Test "clip-vit-large-patch14" has 7 cases:
	 Passed: 0
	 Failed: 7

clip_vit_fp32.log

Input: input_ids, shape: (10, 77)
Input: attention_mask, shape: (10, 77)
Input: pixel_values, shape: (1, 3, 224, 224)

Test "clip-vit-large-patch14" has 7 cases:
	 Passed: 7
	 Failed: 0

SQuAD dataset (text)

gemma_2b_it_fp16.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: position_ids, shape: (1, 384)

Test "gemma-2b-it" has 30 cases:
	 Passed: 0
	 Failed: 30

gemma_2b_it_fp32.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: position_ids, shape: (1, 384)

Test "gemma-2b-it" has 30 cases:
	 Passed: 30
	 Failed: 0

t5_base_fp16.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: decoder_input_ids, shape: (1, 384)

Test "t5-base" has 30 cases:
	 Passed: 0
	 Failed: 30

Note: All Actual value is zero!

t5_base_fp32.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: decoder_input_ids, shape: (1, 384)

Test "t5-base" has 30 cases:
	 Passed: 30
	 Failed: 0

@attila-dusnoki-htec
Copy link
Author

attila-dusnoki-htec commented Apr 23, 2024

Current State pt3

SQuAD dataset (text)

bert-large-uncased_fp16.log

Input: input_ids, shape: (1, 384)
Input: token_type_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)

Test "bert-large-uncased" has 5 cases:
	 Passed: 0
	 Failed: 5

bert-large-uncased_fp32.log

Input: input_ids, shape: (1, 384)
Input: token_type_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)

Test "bert-large-uncased" has 5 cases:
	 Passed: 4
	 Failed: 1

llama2-7b-chat-hf_fp16.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: position_ids, shape: (1, 384)

Test "llama2-7b-chat-hf" has 17 cases:
	 Passed: 0
	 Failed: 17

Note: Major difference, not jut a simple tolarence issue.

llama2-7b-chat-hf_fp32.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: position_ids, shape: (1, 384)

Test "llama2-7b-chat-hf" has 17 cases:
	 Passed: 17
	 Failed: 0

llama3-8b-instruct_fp16.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: position_ids, shape: (1, 384)

Test "llama3-8b-instruct" has 25 cases:
	 Passed: 0
	 Failed: 25

Note: Major difference, not jut a simple tolarence issue.

llama3-8b-instruct_fp32.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: position_ids, shape: (1, 384)

Test "llama3-8b-instruct" has 25 cases:
	 Passed: 25
	 Failed: 0

@attila-dusnoki-htec attila-dusnoki-htec moved this from 🏗 In progress to 👀 Review requested in MIGraphX ONNX support Apr 24, 2024
@attila-dusnoki-htec
Copy link
Author

Add the following models:

  • DLRM-DCNv2
  • Wide and Deep
  • SD 1.5/XL
  • RNNT
  • BERT-L
  • Resnet50 1.5

@attila-dusnoki-htec attila-dusnoki-htec moved this from 👀 Review requested to 🏗 In progress in MIGraphX ONNX support Apr 29, 2024
@attila-dusnoki-htec
Copy link
Author

attila-dusnoki-htec commented Apr 29, 2024

DLRM-DCNv2

Can't be exported to onnx due to KeyedJaggedTensor

But it could be created with these:
dlrm_model.py
export_dlrm.py

To create the dataset: https://github.com/facebookresearch/dlrm/blob/main/torchrec_dlrm/scripts/process_Criteo_1TB_Click_Logs_dataset.sh
These can be changed to only use the day_23 file.

@attila-dusnoki-htec
Copy link
Author

Current State pt4

COCO dataset (image) + Style prompts (text)

stable-diffusion-2-1_text_encoder_fp16.log

Input: input_ids, shape: (2, 77)

Test "text_encoder" has 5 cases:
	 Passed: 0
	 Failed: 5

stable-diffusion-2-1_text_encoder_fp32.log

Input: input_ids, shape: (2, 77)

Test "text_encoder" has 5 cases:
	 Passed: 5
	 Failed: 0

stable-diffusion-2-1_unet_fp16.log

Input: sample, shape: (2, 4, 64, 64)
Input: encoder_hidden_states, shape: (2, 77, 1024)
Input: timestep, shape: (1,)

Test "unet" has 25 cases:
	 Passed: 0
	 Failed: 25

Note: Outputs are nans

stable-diffusion-2-1_unet_fp32.log

Input: sample, shape: (2, 4, 64, 64)
Input: encoder_hidden_states, shape: (2, 77, 1024)
Input: timestep, shape: (1,)

Test "unet" has 25 cases:
	 Passed: 25
	 Failed: 0

stable-diffusion-2-1_vae_decoder_fp16.log

Input: latent_sample, shape: (1, 4, 64, 64)

Test "vae_decoder" has 5 cases:
	 Passed: 0
	 Failed: 5

stable-diffusion-2-1_vae_decoder_fp32.log

Input: latent_sample, shape: (1, 4, 64, 64)

Test "vae_decoder" has 5 cases:
	 Passed: 5
	 Failed: 0

stable-diffusion-2-1_vae_encoder_fp16.log

Input: sample, shape: (1, 3, 512, 512)

Test "vae_encoder" has 5 cases:
	 Passed: 0
	 Failed: 5

Note: Some of the outputs are nans

stable-diffusion-2-1_vae_encoder_fp32.log

Input: sample, shape: (1, 3, 512, 512)

Test "vae_encoder" has 5 cases:
	 Passed: 0
	 Failed: 5

stable-diffusion-xl_text_encoder_2_fp16.log

Input: input_ids, shape: (2, 77)

Test "text_encoder_2" has 5 cases:
	 Passed: 0
	 Failed: 5

stable-diffusion-xl_text_encoder_2_fp32.log

Input: input_ids, shape: (2, 77)

Test "text_encoder_2" has 5 cases:
	 Passed: 5
	 Failed: 0

stable-diffusion-xl_text_encoder_fp16.log

Input: input_ids, shape: (2, 77)

Test "text_encoder" has 5 cases:
	 Passed: 0
	 Failed: 5

stable-diffusion-xl_text_encoder_fp32.log

Input: input_ids, shape: (2, 77)

Test "text_encoder" has 5 cases:
	 Passed: 5
	 Failed: 0

stable-diffusion-xl_unet_fp16.log

Input: sample, shape: (2, 4, 128, 128)
Input: encoder_hidden_states, shape: (2, 77, 2048)
Input: timestep, shape: (1,)
Input: text_embeds, shape: (2, 1280)
Input: time_ids, shape: (2, 6)

Test "unet" has 25 cases:
	 Passed: 0
	 Failed: 25

stable-diffusion-xl_unet_fp32.log

Input: sample, shape: (2, 4, 128, 128)
Input: encoder_hidden_states, shape: (2, 77, 2048)
Input: timestep, shape: (1,)
Input: text_embeds, shape: (2, 1280)
Input: time_ids, shape: (2, 6)

Test "unet" has 25 cases:
	 Passed: 25
	 Failed: 0

stable-diffusion-xl_vae_decoder_fp16.log

Input: latent_sample, shape: (1, 4, 128, 128)

Test "vae_decoder" has 5 cases:
	 Passed: 0
	 Failed: 5

Note: Outputs are nans

stable-diffusion-xl_vae_decoder_fp32.log

Input: latent_sample, shape: (1, 4, 128, 128)

Test "vae_decoder" has 5 cases:
	 Passed: 5
	 Failed: 0

stable-diffusion-xl_vae_encoder_fp16.log

Input: sample, shape: (1, 3, 1024, 1024)

Test "vae_encoder" has 5 cases:
	 Passed: 0
	 Failed: 5

Note: Outputs are nans

stable-diffusion-xl_vae_encoder_fp32.log

Input: sample, shape: (1, 3, 1024, 1024)

Test "vae_encoder" has 5 cases:
	 Passed: 0
	 Failed: 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Development

No branches or pull requests

1 participant