Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: multithreaded file source breaks interpretation #1036

Merged

Conversation

lobis
Copy link
Collaborator

@lobis lobis commented Nov 20, 2023

I added a test that is able to reproduce this.

It's not a fsspec issue, its a race condition triggered when using multiple threads (also with the uproot.source.file.MultithreadedFileSource).

When a branch with subbranches is interpreted, the subbranches are submitted first. If this is the order in which they are interpreted it works fine. When using a multithreaded source, this order may not be maintained and it breaks.

@lobis lobis linked an issue Nov 20, 2023 that may be closed by this pull request
@lobis lobis changed the title fix: fsspec source bug (https://github.com/scikit-hep/uproot5/issues/1035) fix: multithreaded file source breaks interpretation Nov 21, 2023
@lobis lobis requested review from jpivarski and nsmith- November 21, 2023 21:50
@lobis lobis marked this pull request as ready for review November 21, 2023 21:51
@lobis
Copy link
Collaborator Author

lobis commented Nov 21, 2023

I added a fix but I'm not sure of the possible consequences of it.

@lobis
Copy link
Collaborator Author

lobis commented Nov 22, 2023

#1040 may be related (both use atlas file)

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congratulations on finding this!

I think this is the right fix: AsGrouped branches are not expected to have any data (the TClonesArray oddball case isn't AsGrouped, I think), but somehow, some of them have baskets anyway. That was breaking the logic of the event loop because it passed when it acquired the expected total number of baskets, but some of the baskets that were collected were unexpectedly from an AsGrouped parent branch. Since the data collection is concurrent, it was an intermittent failure because it depended on whether the unexpected parent branch baskets arrived before the child branch baskets or not.

I rearranged the logic of the if and for statements for readability, then verified that your test always fails without the fix and always passes with the fix.

I think this is ready to be merged; I'll let you decide if you need to make counter-edits before pressing the button.

@lobis lobis enabled auto-merge (squash) November 24, 2023 20:26
@lobis lobis merged commit 8290c4c into main Nov 24, 2023
@lobis lobis deleted the 1035-fsspec-source-does-not-match-memmapsource-for-local-file branch November 24, 2023 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

multithreaded file source breaks interpretation
2 participants