-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move dagster dev tests to core #27654
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
20c3b36
to
6250a23
Compare
8841273
to
e2d78a6
Compare
e2d78a6
to
d3c450d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does moving these tests do to the runtime for the associated step in BK?
thanks for digging in to this - unfortunate that there is so much divergence for windows but im not sure theres any better way to deal with it
@@ -211,6 +211,14 @@ def interrupt_ipc_subprocess(proc: "Popen[Any]") -> None: | |||
proc.send_signal(signal.SIGINT) | |||
|
|||
|
|||
def interrupt_or_kill_ipc_subprocess(proc: "Popen[Any]", wait_time: int = 10) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: _then_
instead of _or_
maybe
python_modules/dagster/tox.ini
Outdated
@@ -35,7 +40,7 @@ commands = | |||
|
|||
api_tests: pytest -c ../../pyproject.toml -vv ./dagster_tests/api_tests {env:COVERAGE_ARGS} --durations 10 {posargs} | |||
asset_defs_tests: pytest -c ../../pyproject.toml -vv ./dagster_tests/asset_defs_tests {env:COVERAGE_ARGS} --durations 10 {posargs} | |||
cli_tests: pytest -c ../../pyproject.toml -vv ./dagster_tests/cli_tests {env:COVERAGE_ARGS} --durations 10 {posargs} | |||
cli_tests: pytest -x -c ../../pyproject.toml -vv ./dagster_tests/cli_tests/ {env:COVERAGE_ARGS} --durations 10 {posargs} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean this up i assume
# Skip windows here-- it behaves strangely with respect to child processes and cmdline(). But we | ||
# are still checking that all child processes shut down later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the issue is we just cant exclude the tracker processes? whats up with cmdline
? I think its worth being more specific here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I forgot to fully clean this up and had some questions for you about it:
- I called these "tracker processes" as a kind of guess just to have a name
but I'm not totally sure where they're coming from. I thought maybe from the watch threads of the workspace but those are threads and not separate processes.I think they're from the sensor that issues aRunRequest
. They look like this and the number of them varies per test run (I've seen 0, 2, 3):
PID [69896] PPID [69883]: ['/Users/smackesey/stm/code/elementl/oss/.venv/bin/python3', '-c', 'from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=13, pipe_handle=22)', '--multiprocessing-fork']
- I will definitely elaborate the error message. There are two issues on windows:
- The transient nature of the "tracker processes" leads to situations where they exist at the time the child processes are gathered but are dead when we check the
cmdline()
. Checkingcmdline()
of a dead process throws an error on Windows (only). - For some reason there are ~2x the processes on Windows when running through tox. I think each process we're launching directly goes through a tox shim, which is it's own persistent process that invokes the "real" target process.
- The transient nature of the "tracker processes" leads to situations where they exist at the time the child processes are gathered but are dead when we check the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the multiprocessing
module is used extra processes are spawned to track certain cross process data structures, so i think you named it those well. No idea whats up with the tox shim thing.
Having to skip out on some of this validation on windows is unfortunate but getting the functionality under tests is a big improvement I am not sure its worth the time it would take to try to improve past what you've cooked up here.
Best thing to do is just leave as much context as you can for the next poor soul who finds them self in this abyss.
Increases by a few minutes, but it's still not the slowest
One way would be to use dedicated IPC (pipe or socket) to send the shutdown signal to the child processes instead of signals. But I don't think there's an existing channel for this. |
the code servers can get shutdown with this i think |
d3c450d
to
f5a7070
Compare
## Summary & Motivation Our only `dagster dev` tests currently live in the daemon integration test suite, which isn't run on windows. This moves them to `dagster`, which is tested on windows. The tests are mostly moved over as they were, with a few adjustments: - Instead of loading the toys repo, we now just load the simple test repo loaded in the rest of the tests. The toys repo adds a bunch of dependencies but loading it specifically doesn't really test anything relevant. - All tests have been augmented to ensure all child processes are shut down (on Unix at least-- see below) - Some refactoring to make the tests simpler to read. Previously we weren't testing this on Windows at all, and I went in thinking we would end with exactly the same tests passing on windows and unix. Unfortunately, that's not _quite_ true-- with our current set up I could not find a way to reliably and cleanly shut down the `dagster dev` process tree on Windows in a CI environment. This has to do with the differences in signal handling and process groups in Windows vs unix. I will quote ChatGPT-o1: >On Windows, CTRL_BREAK_EVENT (and similarly CTRL_C_EVENT) only works when the calling process and the target processes share a console. By default, Python’s `subprocess.Popen` with CREATE_NEW_PROCESS_GROUP spawns the child into its own new process group but does not automatically attach it to the same console as the parent (and in many CI environments, there may be no real console at all). As a result, sending CTRL_BREAK_EVENT to that child process group essentially does nothing—Windows never delivers the interrupt to those child processes, and they keep running. The lack of console in the CI environment means that `CTRL_BREAK_EVENT`, which we use to shut down the child processes, does not propagate directly from `dev` to the webserver/daemon (but it does appear to work when `dagster dev` is used interactively, due to sharing a console). So in the Windows tests I ended up explicitly killing all child processes. We could maybe make this more robust in the future by having the `dev` process send a shutdown signal via another form of IPC, but I couldn't see how to do that here. ## How I Tested These Changes Ran moved `dagster dev` tests on windows and unix.
## Summary & Motivation Our only `dagster dev` tests currently live in the daemon integration test suite, which isn't run on windows. This moves them to `dagster`, which is tested on windows. The tests are mostly moved over as they were, with a few adjustments: - Instead of loading the toys repo, we now just load the simple test repo loaded in the rest of the tests. The toys repo adds a bunch of dependencies but loading it specifically doesn't really test anything relevant. - All tests have been augmented to ensure all child processes are shut down (on Unix at least-- see below) - Some refactoring to make the tests simpler to read. Previously we weren't testing this on Windows at all, and I went in thinking we would end with exactly the same tests passing on windows and unix. Unfortunately, that's not _quite_ true-- with our current set up I could not find a way to reliably and cleanly shut down the `dagster dev` process tree on Windows in a CI environment. This has to do with the differences in signal handling and process groups in Windows vs unix. I will quote ChatGPT-o1: >On Windows, CTRL_BREAK_EVENT (and similarly CTRL_C_EVENT) only works when the calling process and the target processes share a console. By default, Python’s `subprocess.Popen` with CREATE_NEW_PROCESS_GROUP spawns the child into its own new process group but does not automatically attach it to the same console as the parent (and in many CI environments, there may be no real console at all). As a result, sending CTRL_BREAK_EVENT to that child process group essentially does nothing—Windows never delivers the interrupt to those child processes, and they keep running. The lack of console in the CI environment means that `CTRL_BREAK_EVENT`, which we use to shut down the child processes, does not propagate directly from `dev` to the webserver/daemon (but it does appear to work when `dagster dev` is used interactively, due to sharing a console). So in the Windows tests I ended up explicitly killing all child processes. We could maybe make this more robust in the future by having the `dev` process send a shutdown signal via another form of IPC, but I couldn't see how to do that here. ## How I Tested These Changes Ran moved `dagster dev` tests on windows and unix.
Summary & Motivation
Our only
dagster dev
tests currently live in the daemon integration test suite, which isn't run on windows. This moves them todagster
, which is tested on windows.The tests are mostly moved over as they were, with a few adjustments:
Previously we weren't testing this on Windows at all, and I went in thinking we would end with exactly the same tests passing on windows and unix. Unfortunately, that's not quite true-- with our current set up I could not find a way to reliably and cleanly shut down the
dagster dev
process tree on Windows in a CI environment. This has to do with the differences in signal handling and process groups in Windows vs unix. I will quote ChatGPT-o1:The lack of console in the CI environment means that
CTRL_BREAK_EVENT
, which we use to shut down the child processes, does not propagate directly fromdev
to the webserver/daemon (but it does appear to work whendagster dev
is used interactively, due to sharing a console). So in the Windows tests I ended up explicitly killing all child processes. We could maybe make this more robust in the future by having thedev
process send a shutdown signal via another form of IPC, but I couldn't see how to do that here.How I Tested These Changes
Ran moved
dagster dev
tests on windows and unix.