-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opening more than 256 files or file descriptors at once? #63
Comments
Hey thanks for the report! Macs do have relatively low default file descriptor settings compared to Linux like you say. I'm not sure why the parquet library opens too many files. Is the parquet file something you can share? I'm happy to leave this open but especially since there's a workaround (adjust ulimit) I probably won't get around to looking into this for a while. |
Sorry can't share the contents for that specific file, but OSX Instruments tells me that The file contains around 1500 columns in its original .tsv form. Not sure I'll have time to put together a reproducer with non-private data, but I hope that serves as some kind of hint on what might be going wrong? Adjusting ulimit seems like a bad patch for what it seems to be an underlying library issue? |
Yeah that doesn't seem great, but I'm not familiar with parquet internals nor the internals of the particular library datastation/dsq uses. If you'd like you can open up a bug report with https://github.com/xitongsys/parquet-go right now and ask them. When I get around to finding a case they can reproduce this behavior with I'd open a bug ticket myself (if you haven't by then). |
The default top limit of file descriptors as of OSX Monterey 12.4 seems to be 256 file descriptors (a bit low when compared with the linux one, which is 1024, IIRC), but regardless:
How come so many (intermediate?) files are required to open a regular
.parquet
file around the ~200KB filesize mark?The text was updated successfully, but these errors were encountered: