-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for SD3 testing along with a refactor of the suite. #266
base: main
Are you sure you want to change the base?
Conversation
Looks good to me for now, the only thing is we should move the artifacts to a "stable" folder of some sort, and then update that regularly with artifacts generated on newer versions of turbine as soon as they pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be the breaking point for complexity in the current conftest.py - it's not the right tool for these special tests. Let's find a way to refactor to get this coverage without needing to bend the config file directories, workflow files, conftest.py test collector/runner, etc.
"skip_compile_tests": [ | ||
"sdxl-scheduled-unet-3-tank", | ||
"sdxl-vae-decode-tank", | ||
"sdxl-prompt-encoder-tank" | ||
"sdxl-prompt-encoder-tank", | ||
"sd3-mmdit", | ||
"sd3-vae-decode", | ||
"sd3-prompt-encoder" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is going to exclude all these special models they should be moved to a different subdirectory.
The config_*_models
configs were intended to test a set of models that were imported into the test suite in a uniform way and each of these models is special in some way.
# expand data flag file, so beter for logging and can use environment variables | ||
flag_file_path = f"{self.test_cwd}/{self.spec.data_flagfile_name}" | ||
file = open(flag_file_path) | ||
for line in file: | ||
self.run_args.append(line.rstrip()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment style: start with an uppercase character, end with a period. Also fix typo and adjust wording.
# expand data flag file, so beter for logging and can use environment variables | |
flag_file_path = f"{self.test_cwd}/{self.spec.data_flagfile_name}" | |
file = open(flag_file_path) | |
for line in file: | |
self.run_args.append(line.rstrip()) | |
# Expand data flag files to make logs explicit | |
# Tools accept `--flagfile=/path/to/flagfile` but logs are easier | |
# to read with explicit `--flag1=value1 --flag2=value2` flags. | |
flag_file_path = f"{self.test_cwd}/{self.spec.data_flagfile_name}" | |
file = open(flag_file_path) | |
for line in file: | |
self.run_args.append(line.rstrip()) |
Note that flagfiles are a requirement for some commands and environments. For example, certain terminals have character length limits around 512 or so characters for commands and putting flags in files works around that.
run_env["IREE_TEST_BACKEND"] = os.getenv( | ||
"IREE_TEST_BACKEND", default="none" | ||
) | ||
|
||
# expand environment variable for logging | ||
backend = run_env["IREE_TEST_BACKEND"] | ||
cmd = subprocess.list2cmdline(self.run_args) | ||
cmd = cmd.replace("${IREE_TEST_BACKEND}", f"{backend}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too sketchy IMO. Flagfiles should be runnable as-is and this is adding an extra indirection that will be too difficult to reproduce outside of a CI environment. Any tests needing this behavior should be using a mechanism other than this conftest.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me elaborate a bit...
This iree_tests subproject is home to:
- model sources
- test inputs
- test outputs
- utilities for converting into the test suite format
- utilities for helping run test suites
- simple examples of test suite instantiations
For large test suites following a standardized style (ONNX unit tests, ONNX models, StableHLO models, JAX programs, etc.), iree_tests/conftest.py
and the config files there give us a lightweight way to define test cases that can be piped through iree-compile
-> iree-run-module
using a shared set of flags for each test configuration (compile options + run options)
For SDXL, SD3, llama, and other models that we're giving special attention, we should be testing both the out of the box import -> compile -> run path that fits that mold and a curated path like https://github.com/iree-org/iree/blob/main/experimental/regression_suite/tests/pregenerated/test_llama2.py or https://github.com/nod-ai/sdxl-scripts .
- We can keep a separation between the model sources here (including any scripts needed to get from frameworks, safetensors files, requirements.txt files, etc. down to .mlir and .irpa files) and downstream test instantiations - we'll just end up with more (I'm assuming Python) code downstream to set up exactly what we want tested.
At the point where a model needs a carve-out in a config.json file, an environment variable, a nonstandard file (spec file), or changes to conftest.py, it is too complex/different and should be given its own separate test.
- We saw some confusion today when one of these special tests was newly passing here on Discord. It took several of us some time to find which config file needed updating since the test was running as part of "sdxl_schedueld_unet" and not "pytorch_models"
We can share test code between the standardized path and custom model tests where it makes sense to do so. In particular, the "compile a program" and "run a program" parts could be fixtures (as they are in https://github.com/iree-org/iree/blob/main/experimental/regression_suite/ireers/fixtures.py). We should have a common way for those stages to run, with the same logging format and the same error messages when tests are unexpectedly passing, newly failing, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this makes sense. I will take a look at an alternate path for the custom models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can still use the same downloading utilities though to fetch all the sources right (download_remote_files.py). We'll just have a different test_cases.json for the script to parse in the custom path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep the current downloading, or we could follow the fetch_source_fixture
style in https://github.com/iree-org/iree/blob/6f178692f11d356a284eba915f2cf1d067d92a69/experimental/regression_suite/tests/pregenerated/test_llama2.py#L21-L24 . If we're explicitly setting up test cases like in that file then we could just fully script it all there, no need for a separate .json file.
IDK, and I can't take a deep context switch right now to think through the design.
I'd start with that test_llama2.py and adapt it to the separate repo model (test sources, inputs, outputs in one repo, test configurations in another)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, no worries, I'd rather stick to unified downloading process (making use of the same tools) to a cache in the test-suite and unified compile/runtime through a fixture as you suggested so we are using the same tools across the whole repo where we could. So, I will proceed with that for now
This commit adds in the necessary support for integrating tests for all the sd3 sub models.
Also refactored the test suite and made changes along the way to get everything working smoothly.
We no longer have input, output, and module flags coming from the json configs.
Enabled expansion of the real weight file, so we can use environment variables while running the module. With this, I switched to using an environment variable, so we can just use one real_weights file for sdxl-scheduled-unet (dealing with cpu and rocm pipeline vmfbs).
These changes allow us to group sd3 and sdxl testing into the same json configs for a clean integration :)
All the azure artifacts for each submodel (inputs, outputs, weights) can be found in
sharkpublic/sai/sd3-mmdit
,sharkpublic/sai/sd3-vae
, andsharkpublic/sai/sd3-prompt-encoder
. The mlirs and flags are all set up here.