Add support for SD3 testing along with a refactor of the suite. #266

saienduri · 2024-06-15T11:36:19Z

This commit adds in the necessary support for integrating tests for all the sd3 sub models.
Also refactored the test suite and made changes along the way to get everything working smoothly.
We no longer have input, output, and module flags coming from the json configs.
Enabled expansion of the real weight file, so we can use environment variables while running the module. With this, I switched to using an environment variable, so we can just use one real_weights file for sdxl-scheduled-unet (dealing with cpu and rocm pipeline vmfbs).
These changes allow us to group sd3 and sdxl testing into the same json configs for a clean integration :)
All the azure artifacts for each submodel (inputs, outputs, weights) can be found in sharkpublic/sai/sd3-mmdit, sharkpublic/sai/sd3-vae, and sharkpublic/sai/sd3-prompt-encoder. The mlirs and flags are all set up here.

dan-garvey · 2024-06-17T23:59:04Z

Looks good to me for now, the only thing is we should move the artifacts to a "stable" folder of some sort, and then update that regularly with artifacts generated on newer versions of turbine as soon as they pass

ScottTodd

This may be the breaking point for complexity in the current conftest.py - it's not the right tool for these special tests. Let's find a way to refactor to get this coverage without needing to bend the config file directories, workflow files, conftest.py test collector/runner, etc.

ScottTodd · 2024-06-18T00:15:21Z

iree_tests/configs/config_gpu_rocm_models.json

    "skip_compile_tests": [
      "sdxl-scheduled-unet-3-tank",
      "sdxl-vae-decode-tank",
-      "sdxl-prompt-encoder-tank"
+      "sdxl-prompt-encoder-tank",
+      "sd3-mmdit",
+      "sd3-vae-decode",
+      "sd3-prompt-encoder"


If this is going to exclude all these special models they should be moved to a different subdirectory.

The config_*_models configs were intended to test a set of models that were imported into the test suite in a uniform way and each of these models is special in some way.

ScottTodd · 2024-06-18T00:17:01Z

iree_tests/conftest.py

+        # expand data flag file, so beter for logging and can use environment variables
+        flag_file_path = f"{self.test_cwd}/{self.spec.data_flagfile_name}"
+        file = open(flag_file_path)
+        for line in file:
+            self.run_args.append(line.rstrip())


Comment style: start with an uppercase character, end with a period. Also fix typo and adjust wording.

Suggested change

# expand data flag file, so beter for logging and can use environment variables

flag_file_path = f"{self.test_cwd}/{self.spec.data_flagfile_name}"

file = open(flag_file_path)

for line in file:

self.run_args.append(line.rstrip())

# Expand data flag files to make logs explicit

# Tools accept `--flagfile=/path/to/flagfile` but logs are easier

# to read with explicit `--flag1=value1 --flag2=value2` flags.

flag_file_path = f"{self.test_cwd}/{self.spec.data_flagfile_name}"

file = open(flag_file_path)

for line in file:

self.run_args.append(line.rstrip())

Note that flagfiles are a requirement for some commands and environments. For example, certain terminals have character length limits around 512 or so characters for commands and putting flags in files works around that.

ScottTodd · 2024-06-18T00:23:33Z

iree_tests/conftest.py

+        run_env["IREE_TEST_BACKEND"] = os.getenv(
+            "IREE_TEST_BACKEND", default="none"
+        )

+        # expand environment variable for logging
+        backend = run_env["IREE_TEST_BACKEND"]
+        cmd = subprocess.list2cmdline(self.run_args)
+        cmd = cmd.replace("${IREE_TEST_BACKEND}", f"{backend}")


This is too sketchy IMO. Flagfiles should be runnable as-is and this is adding an extra indirection that will be too difficult to reproduce outside of a CI environment. Any tests needing this behavior should be using a mechanism other than this conftest.py.

Let me elaborate a bit...

This iree_tests subproject is home to:

model sources

test inputs

test outputs

utilities for converting into the test suite format

utilities for helping run test suites

simple examples of test suite instantiations

For large test suites following a standardized style (ONNX unit tests, ONNX models, StableHLO models, JAX programs, etc.), iree_tests/conftest.py and the config files there give us a lightweight way to define test cases that can be piped through iree-compile -> iree-run-module using a shared set of flags for each test configuration (compile options + run options)

For SDXL, SD3, llama, and other models that we're giving special attention, we should be testing both the out of the box import -> compile -> run path that fits that mold and a curated path like https://github.com/iree-org/iree/blob/main/experimental/regression_suite/tests/pregenerated/test_llama2.py or https://github.com/nod-ai/sdxl-scripts .

We can keep a separation between the model sources here (including any scripts needed to get from frameworks, safetensors files, requirements.txt files, etc. down to .mlir and .irpa files) and downstream test instantiations - we'll just end up with more (I'm assuming Python) code downstream to set up exactly what we want tested.

At the point where a model needs a carve-out in a config.json file, an environment variable, a nonstandard file (spec file), or changes to conftest.py, it is too complex/different and should be given its own separate test.

We saw some confusion today when one of these special tests was newly passing here on Discord. It took several of us some time to find which config file needed updating since the test was running as part of "sdxl_schedueld_unet" and not "pytorch_models"

We can share test code between the standardized path and custom model tests where it makes sense to do so. In particular, the "compile a program" and "run a program" parts could be fixtures (as they are in https://github.com/iree-org/iree/blob/main/experimental/regression_suite/ireers/fixtures.py). We should have a common way for those stages to run, with the same logging format and the same error messages when tests are unexpectedly passing, newly failing, etc.

Yeah this makes sense. I will take a look at an alternate path for the custom models

We can still use the same downloading utilities though to fetch all the sources right (download_remote_files.py). We'll just have a different test_cases.json for the script to parse in the custom path?

We can keep the current downloading, or we could follow the fetch_source_fixture style in https://github.com/iree-org/iree/blob/6f178692f11d356a284eba915f2cf1d067d92a69/experimental/regression_suite/tests/pregenerated/test_llama2.py#L21-L24 . If we're explicitly setting up test cases like in that file then we could just fully script it all there, no need for a separate .json file.

IDK, and I can't take a deep context switch right now to think through the design.

I'd start with that test_llama2.py and adapt it to the separate repo model (test sources, inputs, outputs in one repo, test configurations in another)

Yeah, no worries, I'd rather stick to unified downloading process (making use of the same tools) to a cache in the test-suite and unified compile/runtime through a fixture as you suggested so we are using the same tools across the whole repo where we could. So, I will proceed with that for now

saienduri added 10 commits June 15, 2024 01:53

make changes to existing scheduled unet testing

bd6c512

update tolerance

1f8fa6c

add functionality for run time env vars

afbf4b7

update prompt encoder testing for more generality as well

ef3087a

generalize vae testing as well

b235dcb

update tolerances for vae

10f6db3

make changes to vae configs, real weight flags, and add in sd3-vae

71947a7

add in f16 tolerance for vae

133ec67

all changes needed for sd3-unet and sdxl-unet merge

db550fd

all config, tolerance updates for prompt encoder

e255915

saienduri marked this pull request as draft June 15, 2024 11:36

saienduri added 2 commits June 15, 2024 04:38

delete file pushed by accident

306d2a0

changes to adapt to all the changes made to the testing suite

c34665d

saienduri changed the title ~~Sd3 setup~~ Add support for SD3 testing along with a refactor of the suite. Jun 15, 2024

saienduri added 2 commits June 15, 2024 05:24

spelling mistake

2e4c194

remove debug code

0d52c6f

saienduri marked this pull request as ready for review June 15, 2024 12:49

saienduri requested a review from ScottTodd June 17, 2024 17:29

ScottTodd requested changes Jun 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for SD3 testing along with a refactor of the suite. #266

Add support for SD3 testing along with a refactor of the suite. #266

saienduri commented Jun 15, 2024 •

edited

Loading

dan-garvey commented Jun 17, 2024

ScottTodd left a comment

ScottTodd Jun 18, 2024

ScottTodd Jun 18, 2024

ScottTodd Jun 18, 2024

ScottTodd Jun 18, 2024

saienduri Jun 18, 2024

saienduri Jun 18, 2024

ScottTodd Jun 18, 2024

saienduri Jun 18, 2024

Add support for SD3 testing along with a refactor of the suite. #266

Are you sure you want to change the base?

Add support for SD3 testing along with a refactor of the suite. #266

Conversation

saienduri commented Jun 15, 2024 • edited Loading

dan-garvey commented Jun 17, 2024

ScottTodd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saienduri commented Jun 15, 2024 •

edited

Loading