-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tkurth/extended distributed primitives #273
Tkurth/extended distributed primitives #273
Conversation
Concerning the changelog, how does that need to be updated? Doesn't that also depend on what was merged in between this PR and other PR which came before this but after I forked the branch? |
Signed-off-by: Thorsten Kurth <[email protected]>
Signed-off-by: Thorsten Kurth <[email protected]>
Signed-off-by: Thorsten Kurth <[email protected]>
Signed-off-by: Thorsten Kurth <[email protected]>
Signed-off-by: Thorsten Kurth <[email protected]>
Signed-off-by: Thorsten Kurth <[email protected]>
Signed-off-by: Thorsten Kurth <[email protected]>
Signed-off-by: Thorsten Kurth <[email protected]>
Signed-off-by: Thorsten Kurth <[email protected]>
Signed-off-by: Thorsten Kurth <[email protected]>
d7bdb57
to
f4fbd15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Only had a couple of relatively minor comments.
Would be good to also ensure that all the existing distributed tests pass locally by running pytest -m multigpu
in the test/
directory since these tests are not covered by CI currently.
Signed-off-by: Thorsten Kurth <[email protected]>
…ub.com/azrael417/modulus into tkurth/extended-distributed-primitives
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, forgot to approve this earlier. Thanks for addressing the comment w.r.t. unified utilities. LGTM now.
/blossom-ci |
I ran the multrigpu test but ran into some issues. First, there is an assert that num_gpu == 2 (not >=2), so these tests fail on my dgxstation with 4 gpu. Can we relax that criterion a bit? Working around it with cuda visible devices I can run some of the tests but the meshgraphnet one fails, but this is not related to this MR I think: `models/meshgraphnet/test_meshgraphnet_snmg.py FFF [100%] =================================== FAILURES =================================== dtype = torch.float32
models/meshgraphnet/test_meshgraphnet_snmg.py:193: ../../.conda/envs/modulus/lib/python3.10/site-packages/torch/multiprocessing/spawn.py:246: in spawn self = <torch.multiprocessing.spawn.ProcessContext object at 0x7f7f41a258a0>
` These here look good:
|
Yeah, the meshgraphnet failure is not related to this MR and an independent issue. Created a separate issue to track that: #278 |
/blossom-ci |
I actually started using things like |
Modulus Pull Request
Description
This PR enabled gathering of tensors of uneven shapes. This is necessary for integrating modulus into newer versions of makani. Some of the routines can be merged with the V-routines for the graph NN code. I haven't done that yet but I am happy to discuss this
Checklist
Dependencies
No new dependencies necessary