-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: multithreaded file source breaks interpretation #1036
fix: multithreaded file source breaks interpretation #1036
Conversation
I added a fix but I'm not sure of the possible consequences of it. |
#1040 may be related (both use atlas file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congratulations on finding this!
I think this is the right fix: AsGrouped
branches are not expected to have any data (the TClonesArray
oddball case isn't AsGrouped
, I think), but somehow, some of them have baskets anyway. That was breaking the logic of the event loop because it passed when it acquired the expected total number of baskets, but some of the baskets that were collected were unexpectedly from an AsGrouped
parent branch. Since the data collection is concurrent, it was an intermittent failure because it depended on whether the unexpected parent branch baskets arrived before the child branch baskets or not.
I rearranged the logic of the if
and for
statements for readability, then verified that your test always fails without the fix and always passes with the fix.
I think this is ready to be merged; I'll let you decide if you need to make counter-edits before pressing the button.
I added a test that is able to reproduce this.
It's not a fsspec issue, its a race condition triggered when using multiple threads (also with the
uproot.source.file.MultithreadedFileSource
).When a branch with subbranches is interpreted, the subbranches are submitted first. If this is the order in which they are interpreted it works fine. When using a multithreaded source, this order may not be maintained and it breaks.