Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dagster cli] Add option to dagster definitions validate to load in-process #27459

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

benpankow
Copy link
Member

@benpankow benpankow commented Jan 31, 2025

Summary

By default, dagster definitions validate starts up a gRPC server to load the code location. This adds a decent amount of runtime to the command (locally, for me, about 1s of a total 2.6s runtime).

This PR instead defaults to loading the code location in-process, unless pointing at multiple code locations (via workspace.yaml) or a location which specifies an executable_path. This makes the command pretty quick, with the runtime almost entirely dominated by importing the user's code and loading the defintions (around 1.6s for a full components project for me).

How I Tested These Changes

Update unit tests.

Changelog

By default, dagster definitions validate command now loads locations in-process, which speeds up runtime.

@benpankow
Copy link
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@benpankow benpankow force-pushed the benpankow/definitions-validate-no-grpc branch from 464d815 to 42c2430 Compare January 31, 2025 20:06
Copy link
Collaborator

@smackesey smackesey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason not to just always load them in-process?

@benpankow
Copy link
Member Author

Is there any reason not to just always load them in-process?

I think the main case in-process can't handle is validating multiple code locations w/ potentially different Python environments. The other thing is that locations can specify a python executable, that this won't address. Should explicitly test these cases & disallow the flag if they're present.

@smackesey
Copy link
Collaborator

I think the main case in-process can't handle is validating multiple code locations w/ potentially different Python environments. The other thing is that locations can specify a python executable, that this won't address. Should explicitly test these cases & disallow the flag if they're present.

I see-- this is good but I have a strong suspicion most users won't benefit if it's not the default behavior. Also it seems like if different code locations have different python environments, we still shouldn't need to start a code server-- couldn't we just resolve the python executable and run an in-process validation using the resolved executable? Seems like it would simplify things.

@benpankow benpankow requested a review from alangenfeld January 31, 2025 22:26
@benpankow benpankow requested a review from gibsondan January 31, 2025 22:28
@benpankow
Copy link
Member Author

benpankow commented Jan 31, 2025

Updated to default to loading in-process, with the above exceptions.

Tagging @alangenfeld and @gibsondan since they might have a better idea if there are other nuances to be wary of here.

@benpankow benpankow force-pushed the benpankow/definitions-validate-no-grpc branch from 42c2430 to a0c8cae Compare January 31, 2025 22:28
@benpankow benpankow requested a review from smackesey January 31, 2025 22:56


@contextmanager
def get_auto_determined_workspace_from_kwargs(
Copy link
Collaborator

@smackesey smackesey Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have this, do we even need to give the users the --load-with-grpc option? IIUC we can just detect which method to use and fall back to grpc when necessary. If we do it this way then maybe we can eliminate it entirely later.

Just trying to avoid adding API surface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's fair. I figured it's sort of nice to have the flag to force consistently using gRPC if for some reason a user wants to make sure the command does the same thing each time, but don't feel strongly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the method used to load the definitions should be treated as an implementation detail. If there are errors that would somehow only be found from gRPC startup, I would think that those errors are outside the scope of the commands expected output.

Maybe I'm wrong though, will let others weigh in.

Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes to resolve discussion raised by @smackesey. My inclination is to remove the grpc server option entirely as well, but would like @maximearmstrong and @gibsondan to weigh in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants