-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use env to skip PJRT initialize #8609
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest landing this after the libtpu change is approved.
@tengyifei , shall we cherry pick this PR to 2.6 release? |
@zpcore cherrypicking is fine with me. |
The test failed. I think it is due to pytorch/pytorch#142859. They have reverted the PR. |
Ack |
Retrospective LGTM! |
Thanks @zpcore - can we add enough details to PR descriptions to help folks without context understand the intent of the contribution more clearly please? |
The issue in multipod run is that MegascaleXLA(MXLA) will trigger device discovery when we initialize PJRT runtime with the TPU backend. With the introduction of Pallas kernel, we did an extra MXLA trigger when call The hacky way to fix the issue is to use enviroment variable to control the const char* skip_megascale_pjrt_client = std::getenv("SKIP_MEGASCALE_PJRT_CLIENT");
bool skip_megascale = false;
if (skip_megascale_pjrt_client != nullptr) {
skip_megascale = true;
}
if (absl::GetFlag(FLAGS_megascale_num_slices) != 1 && !skip_megascale) {
client = xla::MegaScalePjRtClient::CreateMegaScalePjRtClient(
std::move(tpu_client));
...
} With this fix, MegaScalePjRtClient will only be triggered in place (e.g.,) xla/torch_xla/core/xla_model.py Line 93 in 82d3504
where we call runtime::GetComputationClient() and initialize the client.
|
We skip the PJRT Megascale initialization by controlling the env.
This is a temporary fix, and is supposed to be rolled back.
Check #8609 (comment) for detailed motivation.