Model testing with Datasets #168

attila-dusnoki-htec · 2024-02-16T09:32:58Z

To properly test model accuracy, it is not enough to use random data, since it might not cover the proper range of the possible data.

We should collect candidates for datasets, and assign models to them.

The idea is to use public datasets. HuggingFace provides datasets.
It provies also helpers to load these in python.
Downloading is not enough, since each model has a pre- and post-processing step. It can vary for each model.

The whole process should be automatic and deterministic.

attila-dusnoki-htec · 2024-02-22T07:51:59Z

We should make this work with test_runner.py

A good start would be to enable 2-3 datasets with 1-2 models.

attila-dusnoki-htec · 2024-03-07T08:55:29Z

The current work is here

Covering all 3 major data sources: ImageNet2012 (Images), SQuADv1.1 (Text), LibriSpeechASR (Sound)
With (single-)models that use these with the proper preprocessing steps.
The result is in the format that test_runner.py can run.

Next step: Enable multi-model scenarios e.g. Whisper (encoder-decoder), Stable Diffusion (text-encoder, unet, vae-decoder), etc.

attila-dusnoki-htec · 2024-04-15T08:44:48Z

Current state

Code link to reproduce

Note: The default atol/rtol values were used. For fp16, looking at the logs, the numbers look correct, but less precise then the tolerance. It might be better to report the difference as well to make te comparision easier.

We can download and generate actual test data for the following models with its dataset:

Imagenet dataset (image)

resnet50_v1.5_fp16.log

Input: pixel_values, shape: (1, 3, 224, 224)

Test "resnet50_v1.5" has 7 cases:
	 Passed: 0
	 Failed: 7

resnet50_v1.5_fp32.log

Input: pixel_values, shape: (1, 3, 224, 224)

Test "resnet50_v1.5" has 7 cases:
	 Passed: 7
	 Failed: 0

resnet50_v1_fp16.log

Input: input_tensor:0, shape: (1, 3, 224, 224)

Test "resnet50_v1" has 7 cases:
	 Passed: 1
	 Failed: 6

resnet50_v1_fp32.log

Input: input_tensor:0, shape: (1, 3, 224, 224)

Test "resnet50_v1" has 7 cases:
	 Passed: 7
	 Failed: 0

timm-mobilenetv3-large_fp16.log

Input: pixel_values, shape: (1, 3, 224, 224)

Test "timm-mobilenetv3-large" has 7 cases:
	 Passed: 0
	 Failed: 7

timm-mobilenetv3-large_fp32.log

Input: pixel_values, shape: (1, 3, 224, 224)

Test "timm-mobilenetv3-large" has 7 cases:
	 Passed: 7
	 Failed: 0

vit-base-patch16-224_fp16.log

Input: pixel_values, shape: (1, 3, 224, 224)

Test "vit-base-patch16-224" has 7 cases:
	 Passed: 0
	 Failed: 7

vit-base-patch16-224_fp32.log

Input: pixel_values, shape: (1, 3, 224, 224)

Test "vit-base-patch16-224" has 7 cases:
	 Passed: 7
	 Failed: 0

SQuAD dataset (text)

distilbert-base-cased-distilled-squad_fp16.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)

Test "distilbert-base-cased-distilled-squad" has 7 cases:
	 Passed: 0
	 Failed: 7

distilbert-base-cased-distilled-squad_fp32.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)

Test "distilbert-base-cased-distilled-squad" has 7 cases:
	 Passed: 7
	 Failed: 0

gpt-j_fp16.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: position_ids, shape: (1, 384)

Test "gpt-j" has 42 cases:
	 Passed: 0
	 Failed: 42

gpt-j_fp32.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: position_ids, shape: (1, 384)

Test "gpt-j" has 42 cases:
	 Passed: 42
	 Failed: 0

roberta-base-squad2_fp16.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)

Test "roberta-base-squad2" has 7 cases:
	 Passed: 0
	 Failed: 7

roberta-base-squad2_fp32.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)

Test "roberta-base-squad2" has 7 cases:
	 Passed: 7
	 Failed: 0

LibriSpeech dataset (audio)

wav2vec2-base-960h_fp16.log

Input: input_values, shape: (1, 105440)

Test "wav2vec2-base-960h" has 7 cases:
	 Passed: 0
	 Failed: 7

wav2vec2-base-960h_fp32.log

Input: input_values, shape: (1, 105440)

Test "wav2vec2-base-960h" has 7 cases:
	 Passed: 1
	 Failed: 6

whisper-small-en_fp16.log

Input: input_features, shape: (1, 80, 3000)
Input: decoder_input_ids, shape: (1, 448)

Test "whisper-small-en" has 21 cases:
	 Passed: 0
	 Failed: 21

whisper-small-en_fp32.log

Input: input_features, shape: (1, 80, 3000)
Input: decoder_input_ids, shape: (1, 448)

Test "whisper-small-en" has 21 cases:
	 Passed: 21
	 Failed: 0

attila-dusnoki-htec · 2024-04-16T15:51:14Z

Current State pt2

Imagenet dataset (image)

clip_vit_fp16.log

Input: input_ids, shape: (10, 77)
Input: attention_mask, shape: (10, 77)
Input: pixel_values, shape: (1, 3, 224, 224)

Test "clip-vit-large-patch14" has 7 cases:
	 Passed: 0
	 Failed: 7

clip_vit_fp32.log

Input: input_ids, shape: (10, 77)
Input: attention_mask, shape: (10, 77)
Input: pixel_values, shape: (1, 3, 224, 224)

Test "clip-vit-large-patch14" has 7 cases:
	 Passed: 7
	 Failed: 0

SQuAD dataset (text)

gemma_2b_it_fp16.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: position_ids, shape: (1, 384)

Test "gemma-2b-it" has 30 cases:
	 Passed: 0
	 Failed: 30

gemma_2b_it_fp32.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: position_ids, shape: (1, 384)

Test "gemma-2b-it" has 30 cases:
	 Passed: 30
	 Failed: 0

t5_base_fp16.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: decoder_input_ids, shape: (1, 384)

Test "t5-base" has 30 cases:
	 Passed: 0
	 Failed: 30

Note: All Actual value is zero!

t5_base_fp32.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: decoder_input_ids, shape: (1, 384)

Test "t5-base" has 30 cases:
	 Passed: 30
	 Failed: 0

attila-dusnoki-htec · 2024-04-23T15:23:42Z

Current State pt3

SQuAD dataset (text)

bert-large-uncased_fp16.log

Input: input_ids, shape: (1, 384)
Input: token_type_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)

Test "bert-large-uncased" has 5 cases:
	 Passed: 0
	 Failed: 5

bert-large-uncased_fp32.log

Input: input_ids, shape: (1, 384)
Input: token_type_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)

Test "bert-large-uncased" has 5 cases:
	 Passed: 4
	 Failed: 1

llama2-7b-chat-hf_fp16.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: position_ids, shape: (1, 384)

Test "llama2-7b-chat-hf" has 17 cases:
	 Passed: 0
	 Failed: 17

Note: Major difference, not jut a simple tolarence issue.

llama2-7b-chat-hf_fp32.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: position_ids, shape: (1, 384)

Test "llama2-7b-chat-hf" has 17 cases:
	 Passed: 17
	 Failed: 0

llama3-8b-instruct_fp16.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: position_ids, shape: (1, 384)

Test "llama3-8b-instruct" has 25 cases:
	 Passed: 0
	 Failed: 25

Note: Major difference, not jut a simple tolarence issue.

llama3-8b-instruct_fp32.log

Input: input_ids, shape: (1, 384)
Input: attention_mask, shape: (1, 384)
Input: position_ids, shape: (1, 384)

Test "llama3-8b-instruct" has 25 cases:
	 Passed: 25
	 Failed: 0

attila-dusnoki-htec · 2024-04-29T07:56:25Z

Add the following models:

DLRM-DCNv2
Wide and Deep
SD 1.5/XL
RNNT
BERT-L
Resnet50 1.5

attila-dusnoki-htec · 2024-04-29T10:41:48Z

DLRM-DCNv2

Can't be exported to onnx due to KeyedJaggedTensor

But it could be created with these:
dlrm_model.py
export_dlrm.py

To create the dataset: https://github.com/facebookresearch/dlrm/blob/main/torchrec_dlrm/scripts/process_Criteo_1TB_Click_Logs_dataset.sh
These can be changed to only use the day_23 file.

attila-dusnoki-htec · 2024-05-08T06:51:44Z

Current State pt4

COCO dataset (image) + Style prompts (text)

stable-diffusion-2-1_text_encoder_fp16.log

Input: input_ids, shape: (2, 77)

Test "text_encoder" has 5 cases:
	 Passed: 0
	 Failed: 5

stable-diffusion-2-1_text_encoder_fp32.log

Input: input_ids, shape: (2, 77)

Test "text_encoder" has 5 cases:
	 Passed: 5
	 Failed: 0

stable-diffusion-2-1_unet_fp16.log

Input: sample, shape: (2, 4, 64, 64)
Input: encoder_hidden_states, shape: (2, 77, 1024)
Input: timestep, shape: (1,)

Test "unet" has 25 cases:
	 Passed: 0
	 Failed: 25

Note: Outputs are nans

stable-diffusion-2-1_unet_fp32.log

Input: sample, shape: (2, 4, 64, 64)
Input: encoder_hidden_states, shape: (2, 77, 1024)
Input: timestep, shape: (1,)

Test "unet" has 25 cases:
	 Passed: 25
	 Failed: 0

stable-diffusion-2-1_vae_decoder_fp16.log

Input: latent_sample, shape: (1, 4, 64, 64)

Test "vae_decoder" has 5 cases:
	 Passed: 0
	 Failed: 5

stable-diffusion-2-1_vae_decoder_fp32.log

Input: latent_sample, shape: (1, 4, 64, 64)

Test "vae_decoder" has 5 cases:
	 Passed: 5
	 Failed: 0

stable-diffusion-2-1_vae_encoder_fp16.log

Input: sample, shape: (1, 3, 512, 512)

Test "vae_encoder" has 5 cases:
	 Passed: 0
	 Failed: 5

Note: Some of the outputs are nans

stable-diffusion-2-1_vae_encoder_fp32.log

Input: sample, shape: (1, 3, 512, 512)

Test "vae_encoder" has 5 cases:
	 Passed: 0
	 Failed: 5

stable-diffusion-xl_text_encoder_2_fp16.log

Input: input_ids, shape: (2, 77)

Test "text_encoder_2" has 5 cases:
	 Passed: 0
	 Failed: 5

stable-diffusion-xl_text_encoder_2_fp32.log

Input: input_ids, shape: (2, 77)

Test "text_encoder_2" has 5 cases:
	 Passed: 5
	 Failed: 0

stable-diffusion-xl_text_encoder_fp16.log

Input: input_ids, shape: (2, 77)

Test "text_encoder" has 5 cases:
	 Passed: 0
	 Failed: 5

stable-diffusion-xl_text_encoder_fp32.log

Input: input_ids, shape: (2, 77)

Test "text_encoder" has 5 cases:
	 Passed: 5
	 Failed: 0

stable-diffusion-xl_unet_fp16.log

Input: sample, shape: (2, 4, 128, 128)
Input: encoder_hidden_states, shape: (2, 77, 2048)
Input: timestep, shape: (1,)
Input: text_embeds, shape: (2, 1280)
Input: time_ids, shape: (2, 6)

Test "unet" has 25 cases:
	 Passed: 0
	 Failed: 25

stable-diffusion-xl_unet_fp32.log

Input: sample, shape: (2, 4, 128, 128)
Input: encoder_hidden_states, shape: (2, 77, 2048)
Input: timestep, shape: (1,)
Input: text_embeds, shape: (2, 1280)
Input: time_ids, shape: (2, 6)

Test "unet" has 25 cases:
	 Passed: 25
	 Failed: 0

stable-diffusion-xl_vae_decoder_fp16.log

Input: latent_sample, shape: (1, 4, 128, 128)

Test "vae_decoder" has 5 cases:
	 Passed: 0
	 Failed: 5

Note: Outputs are nans

stable-diffusion-xl_vae_decoder_fp32.log

Input: latent_sample, shape: (1, 4, 128, 128)

Test "vae_decoder" has 5 cases:
	 Passed: 5
	 Failed: 0

stable-diffusion-xl_vae_encoder_fp16.log

Input: sample, shape: (1, 3, 1024, 1024)

Test "vae_encoder" has 5 cases:
	 Passed: 0
	 Failed: 5

Note: Outputs are nans

stable-diffusion-xl_vae_encoder_fp32.log

Input: sample, shape: (1, 3, 1024, 1024)

Test "vae_encoder" has 5 cases:
	 Passed: 0
	 Failed: 5

attila-dusnoki-htec added this to MIGraphX ONNX support Feb 16, 2024

attila-dusnoki-htec converted this from a draft issue Feb 16, 2024

attila-dusnoki-htec moved this from 🆕 New to 🔖 Ready in MIGraphX ONNX support Feb 16, 2024

attila-dusnoki-htec moved this from 🔖 Ready to 🏗 In progress in MIGraphX ONNX support Feb 22, 2024

attila-dusnoki-htec self-assigned this Feb 22, 2024

attila-dusnoki-htec moved this from 🏗 In progress to 🔖 Ready in MIGraphX ONNX support Mar 7, 2024

attila-dusnoki-htec moved this from 🔖 Ready to 🏗 In progress in MIGraphX ONNX support Apr 3, 2024

attila-dusnoki-htec moved this from 🏗 In progress to 🔖 Ready in MIGraphX ONNX support Apr 5, 2024

attila-dusnoki-htec moved this from 🔖 Ready to 🏗 In progress in MIGraphX ONNX support Apr 15, 2024

attila-dusnoki-htec moved this from 🏗 In progress to 👀 Review requested in MIGraphX ONNX support Apr 24, 2024

attila-dusnoki-htec moved this from 👀 Review requested to 🏗 In progress in MIGraphX ONNX support Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model testing with Datasets #168

Model testing with Datasets #168

attila-dusnoki-htec commented Feb 16, 2024

attila-dusnoki-htec commented Feb 22, 2024

attila-dusnoki-htec commented Mar 7, 2024

attila-dusnoki-htec commented Apr 15, 2024 •

edited

Loading

attila-dusnoki-htec commented Apr 16, 2024

attila-dusnoki-htec commented Apr 23, 2024 •

edited

Loading

attila-dusnoki-htec commented Apr 29, 2024

attila-dusnoki-htec commented Apr 29, 2024 •

edited

Loading

attila-dusnoki-htec commented May 8, 2024

Model testing with Datasets #168

Model testing with Datasets #168

Comments

attila-dusnoki-htec commented Feb 16, 2024

attila-dusnoki-htec commented Feb 22, 2024

attila-dusnoki-htec commented Mar 7, 2024

attila-dusnoki-htec commented Apr 15, 2024 • edited Loading

Current state

Imagenet dataset (image)

SQuAD dataset (text)

LibriSpeech dataset (audio)

attila-dusnoki-htec commented Apr 16, 2024

Current State pt2

Imagenet dataset (image)

SQuAD dataset (text)

attila-dusnoki-htec commented Apr 23, 2024 • edited Loading

Current State pt3

SQuAD dataset (text)

attila-dusnoki-htec commented Apr 29, 2024

attila-dusnoki-htec commented Apr 29, 2024 • edited Loading

DLRM-DCNv2

attila-dusnoki-htec commented May 8, 2024

Current State pt4

COCO dataset (image) + Style prompts (text)

attila-dusnoki-htec commented Apr 15, 2024 •

edited

Loading

attila-dusnoki-htec commented Apr 23, 2024 •

edited

Loading

attila-dusnoki-htec commented Apr 29, 2024 •

edited

Loading