Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for SD3 testing along with a refactor of the suite. #266

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

saienduri
Copy link
Contributor

@saienduri saienduri commented Jun 15, 2024

This commit adds in the necessary support for integrating tests for all the sd3 sub models.
Also refactored the test suite and made changes along the way to get everything working smoothly.
We no longer have input, output, and module flags coming from the json configs.
Enabled expansion of the real weight file, so we can use environment variables while running the module. With this, I switched to using an environment variable, so we can just use one real_weights file for sdxl-scheduled-unet (dealing with cpu and rocm pipeline vmfbs).
These changes allow us to group sd3 and sdxl testing into the same json configs for a clean integration :)
All the azure artifacts for each submodel (inputs, outputs, weights) can be found in sharkpublic/sai/sd3-mmdit, sharkpublic/sai/sd3-vae, and sharkpublic/sai/sd3-prompt-encoder. The mlirs and flags are all set up here.

@saienduri saienduri marked this pull request as draft June 15, 2024 11:36
@saienduri saienduri changed the title Sd3 setup Add support for SD3 testing along with a refactor of the suite. Jun 15, 2024
@saienduri saienduri marked this pull request as ready for review June 15, 2024 12:49
@saienduri saienduri requested a review from ScottTodd June 17, 2024 17:29
@dan-garvey
Copy link
Member

Looks good to me for now, the only thing is we should move the artifacts to a "stable" folder of some sort, and then update that regularly with artifacts generated on newer versions of turbine as soon as they pass

Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be the breaking point for complexity in the current conftest.py - it's not the right tool for these special tests. Let's find a way to refactor to get this coverage without needing to bend the config file directories, workflow files, conftest.py test collector/runner, etc.

Comment on lines 12 to +18
"skip_compile_tests": [
"sdxl-scheduled-unet-3-tank",
"sdxl-vae-decode-tank",
"sdxl-prompt-encoder-tank"
"sdxl-prompt-encoder-tank",
"sd3-mmdit",
"sd3-vae-decode",
"sd3-prompt-encoder"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is going to exclude all these special models they should be moved to a different subdirectory.

The config_*_models configs were intended to test a set of models that were imported into the test suite in a uniform way and each of these models is special in some way.

Comment on lines +323 to +327
# expand data flag file, so beter for logging and can use environment variables
flag_file_path = f"{self.test_cwd}/{self.spec.data_flagfile_name}"
file = open(flag_file_path)
for line in file:
self.run_args.append(line.rstrip())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment style: start with an uppercase character, end with a period. Also fix typo and adjust wording.

Suggested change
# expand data flag file, so beter for logging and can use environment variables
flag_file_path = f"{self.test_cwd}/{self.spec.data_flagfile_name}"
file = open(flag_file_path)
for line in file:
self.run_args.append(line.rstrip())
# Expand data flag files to make logs explicit
# Tools accept `--flagfile=/path/to/flagfile` but logs are easier
# to read with explicit `--flag1=value1 --flag2=value2` flags.
flag_file_path = f"{self.test_cwd}/{self.spec.data_flagfile_name}"
file = open(flag_file_path)
for line in file:
self.run_args.append(line.rstrip())

Note that flagfiles are a requirement for some commands and environments. For example, certain terminals have character length limits around 512 or so characters for commands and putting flags in files works around that.

Comment on lines +411 to +418
run_env["IREE_TEST_BACKEND"] = os.getenv(
"IREE_TEST_BACKEND", default="none"
)

# expand environment variable for logging
backend = run_env["IREE_TEST_BACKEND"]
cmd = subprocess.list2cmdline(self.run_args)
cmd = cmd.replace("${IREE_TEST_BACKEND}", f"{backend}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too sketchy IMO. Flagfiles should be runnable as-is and this is adding an extra indirection that will be too difficult to reproduce outside of a CI environment. Any tests needing this behavior should be using a mechanism other than this conftest.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me elaborate a bit...

This iree_tests subproject is home to:

  • model sources
  • test inputs
  • test outputs
  • utilities for converting into the test suite format
  • utilities for helping run test suites
  • simple examples of test suite instantiations

For large test suites following a standardized style (ONNX unit tests, ONNX models, StableHLO models, JAX programs, etc.), iree_tests/conftest.py and the config files there give us a lightweight way to define test cases that can be piped through iree-compile -> iree-run-module using a shared set of flags for each test configuration (compile options + run options)

For SDXL, SD3, llama, and other models that we're giving special attention, we should be testing both the out of the box import -> compile -> run path that fits that mold and a curated path like https://github.com/iree-org/iree/blob/main/experimental/regression_suite/tests/pregenerated/test_llama2.py or https://github.com/nod-ai/sdxl-scripts .

  • We can keep a separation between the model sources here (including any scripts needed to get from frameworks, safetensors files, requirements.txt files, etc. down to .mlir and .irpa files) and downstream test instantiations - we'll just end up with more (I'm assuming Python) code downstream to set up exactly what we want tested.

At the point where a model needs a carve-out in a config.json file, an environment variable, a nonstandard file (spec file), or changes to conftest.py, it is too complex/different and should be given its own separate test.

  • We saw some confusion today when one of these special tests was newly passing here on Discord. It took several of us some time to find which config file needed updating since the test was running as part of "sdxl_schedueld_unet" and not "pytorch_models"

We can share test code between the standardized path and custom model tests where it makes sense to do so. In particular, the "compile a program" and "run a program" parts could be fixtures (as they are in https://github.com/iree-org/iree/blob/main/experimental/regression_suite/ireers/fixtures.py). We should have a common way for those stages to run, with the same logging format and the same error messages when tests are unexpectedly passing, newly failing, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this makes sense. I will take a look at an alternate path for the custom models

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still use the same downloading utilities though to fetch all the sources right (download_remote_files.py). We'll just have a different test_cases.json for the script to parse in the custom path?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep the current downloading, or we could follow the fetch_source_fixture style in https://github.com/iree-org/iree/blob/6f178692f11d356a284eba915f2cf1d067d92a69/experimental/regression_suite/tests/pregenerated/test_llama2.py#L21-L24 . If we're explicitly setting up test cases like in that file then we could just fully script it all there, no need for a separate .json file.

IDK, and I can't take a deep context switch right now to think through the design.

I'd start with that test_llama2.py and adapt it to the separate repo model (test sources, inputs, outputs in one repo, test configurations in another)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, no worries, I'd rather stick to unified downloading process (making use of the same tools) to a cache in the test-suite and unified compile/runtime through a fixture as you suggested so we are using the same tools across the whole repo where we could. So, I will proceed with that for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants