Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsspec Source (the local file one, at least) does not close files #1292

Closed
jpivarski opened this issue Sep 18, 2024 · 4 comments · Fixed by #1333
Closed

fsspec Source (the local file one, at least) does not close files #1292

jpivarski opened this issue Sep 18, 2024 · 4 comments · Fixed by #1333
Labels
bug The problem described is something that must be fixed

Comments

@jpivarski
Copy link
Member

I saw this in @giedrius2020's talk.

>>> import uproot
>>> import skhep_testdata
>>> with uproot.open(skhep_testdata.data_path("uproot-Zmumu.root")) as file:
...     tree = file["events"]
... 
>>> tree["px1"].array()
<Array [-41.2, 35.1, 35.1, 34.1, ..., 32.4, 32.4, 32.5] type='2304 * float64'>
>>> tree.file.closed
False

The with statement was supposed to close the file, making it impossible to later read the arrays from it. (Without this, the with statement is useless: its purpose is to prevent file-handle leaks, and nothing here is closing the file-handle.)

This is where the "file" object (a ReadOnlyDirectory) has an __exit__ method that gets called at the end of the with statement:

uproot5/src/uproot/reading.py

Lines 1515 to 1516 in dc19ce9

def __exit__(self, exception_type, exception_value, traceback):
self._file.source.__exit__(exception_type, exception_value, traceback)

and here is where that gets propagated to an FSSpecSource:

def __exit__(self, exception_type, exception_value, traceback):
self._file.__exit__(exception_type, exception_value, traceback)
self._executor.shutdown()

and here's the pre-fsspec Source (MemmapSource):

def __exit__(self, exception_type, exception_value, traceback):
if self._fallback is None:
if hasattr(self._file._mmap, "__exit__"):
self._file._mmap.__exit__(exception_type, exception_value, traceback)
else:
self._file._mmap.close()
else:
self._fallback.__exit__(exception_type, exception_value, traceback)

Pre-fsspec file-handles got closed:

>>> import uproot
>>> import skhep_testdata
>>> with uproot.open(skhep_testdata.data_path("uproot-Zmumu.root"), handler=uproot.MemmapSource) as file:
...     tree = file["events"]
... 
>>> tree["px1"].array()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpivarski/irishep/uproot5/src/uproot/behaviors/TBranch.py", line 1825, in array
    _ranges_or_baskets_to_arrays(
  File "/home/jpivarski/irishep/uproot5/src/uproot/behaviors/TBranch.py", line 3023, in _ranges_or_baskets_to_arrays
    hasbranches._file.source.chunks(ranges, notifications=notifications)
  File "/home/jpivarski/irishep/uproot5/src/uproot/source/file.py", line 163, in chunks
    raise OSError(f"memmap is closed for file {self._file_path}")
OSError: memmap is closed for file /home/jpivarski/.local/skhepdata/uproot-Zmumu.root
>>> tree.file.closed
True

Now I wonder why pytest is not complaining about the leaking file-handles. I know that it checks for unclosed files because I've seen that error. Here it is, reproduced on the terminal, and our test suite never raised an error. I don't understand that.

Socially, this could be a problem because the fsspec-based file-handles have been available for almost a year (Uproot 5) and users might have gotten used to opening the file in a with statement (uselessly) and later, outside the with, initiating more file-reading with TTree, RNTuple, and TDirectory objects. After fixing this, code that relied on this error will break. It will need to be a new minor version number, 5.4.0, at least.

@jpivarski jpivarski added the bug The problem described is something that must be fixed label Sep 18, 2024
@ariostas
Copy link
Collaborator

ariostas commented Nov 5, 2024

I was curious about this, so here's what I found out.

The fsspec implementation is set up to always return False for the closed property.

# We need this because user may use other executors not defined in `uproot.source` (such as `concurrent.futures`)
# that do not have this interface. If not defined it defaults to calling this property on the source's executor
@property
def closed(self) -> bool:
"""
True if the associated file/connection/thread pool is closed; False
otherwise.
"""
return False

But if you check the underlying file tree.file.source._file.f.closed, then you find that it was closed. It is weird that it is still being able to read the file since it's supposed to be closed.

I feel like the comment above is explains the existence of the property, but not why it always returns False, so I'm guessing it was just a silly mistake. I can submit a PR to simply replace it to return self._file.f.closed (when possible) if you agree.

@ariostas
Copy link
Collaborator

ariostas commented Nov 5, 2024

The other issue is that FSSpecSource.chunks doesn't check if the file is closed.

Also, maybe now it makes sense to drop support for Python 3.8 and clean up that method a bit? It would probably make sense to do both for a minor release.

@jpivarski
Copy link
Member Author

What we can do is make final releases of Awkward and Uproot with Python 3.8 support, and then immediately afterward make a minor release without Python 3.8 support, so that the change in Awkward version number only changes that level of support.

The question then becomes, which PRs do you want to be sure are included in the last with-Python-3.8 release?

@ariostas
Copy link
Collaborator

ariostas commented Nov 5, 2024

The question then becomes, which PRs do you want to be sure are included in the last with-Python-3.8 release?

I think it would make sense to at least wait for the fix/perf PRs that are ready (or almost ready). There's no reason to rush to drop Python 3.8, but I did notice that things started to break for 3.8 (see e.g scikit-hep/scikit-hep-testdata#162).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants