You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
networkx seems to fail to complete min cut for an MLP with two torchao.float8 linears and GELU, bf16.
The script below works when dtype is float32.
If the activation is ReLU, then I see a different error.
Traceback (most recent call last):
File "/opt/pytorch/lightning-thunder/thunder/core/rematerialization.py", line 378, in find_cut
_, (reachable, non_reachable) = nx.minimum_cut(g, "source", "sink")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<class 'networkx.utils.decorators.argmap'> compilation 4", line 3, in argmap_minimum_cut_1
File "/usr/local/lib/python3.12/dist-packages/networkx/utils/backends.py", line 967, in __call__
return self.orig_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/networkx/algorithms/flow/maxflow.py", line 454, in minimum_cut
R = flow_func(flowG, _s, _t, capacity=capacity, value_only=True, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<class 'networkx.utils.decorators.argmap'> compilation 8", line 3, in argmap_preflow_push_5
File "/usr/local/lib/python3.12/dist-packages/networkx/utils/backends.py", line 967, in __call__
return self.orig_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/networkx/algorithms/flow/preflowpush.py", line 422, in preflow_push
R = preflow_push_impl(G, s, t, capacity, residual, global_relabel_freq, value_only)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/networkx/algorithms/flow/preflowpush.py", line 41, in preflow_push_impl
detect_unboundedness(R, s, t)
File "<class 'networkx.utils.decorators.argmap'> compilation 16", line 3, in argmap_detect_unboundedness_13
File "/usr/local/lib/python3.12/dist-packages/networkx/utils/backends.py", line 967, in __call__
return self.orig_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/networkx/algorithms/flow/utils.py", line 173, in detect_unboundedness
raise nx.NetworkXUnbounded(
networkx.exception.NetworkXUnbounded: Infinite capacity path, flow unbounded above.
To Reproduce
Steps to reproduce the behavior: - Checkout #1415, more specifically, 0893fbe.
Build command you used (if compiling from source):
Python version:
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:
Additional context
For this MLP with nvfuser executor, there seems to be either NVIDIA/Fuser#3498 or this one, depending on whether or not I'm applying DCE implemented in 232328c
The text was updated successfully, but these errors were encountered:
We used to hit the same error with Thunder recomputation enabled (#1232). @riccardofelluga, since you hit this problem before do you have a minimal reproducer for the problem? Do you know what changes are needed in the trace here?
🐛 Bug
networkx seems to fail to complete min cut for an MLP with two torchao.float8 linears and GELU, bf16.
The script below works when dtype is float32.
If the activation is ReLU, then I see a different error.
To Reproduce
Steps to reproduce the behavior:
- Checkout #1415, more specifically, 0893fbe.Code sample
Error with ReLU --
Expected behavior
Environment
conda
,pip
, source):Additional context
For this MLP with nvfuser executor, there seems to be either NVIDIA/Fuser#3498 or this one, depending on whether or not I'm applying DCE implemented in 232328c
The text was updated successfully, but these errors were encountered: