-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate Dataset.map() multiprocessing failure #123
Comments
For reference, the type conversion failure handling case was introduced in #78 |
A tiny bit of searching led me to stuff like huggingface/datasets#3195 and huggingface/datasets#3676 - giving the impression that None is supposed to be handled, but it wouldn't surprise anyone if there were still latent bugs in None handling |
From @russellb
I did consider using a different default value than None - any value of that type that is not in filter_value would work, so we could randomly choose one? Ugh. I also considered adding an "invalid_value" field, but adding something like that to the format to work around a bug? Ugh.
|
This change looks related: https://github.com/aakankshaduggal/sdg/pull/7/files It's just dropping the samples converted to |
From #110 I'm guessing it is to avoid:
i.e. what we default to on error needs to be a valid value for any of the supported operators
So, no ... it's any value of that type which will cause the operator to return False |
Pretty sure this is resolved by #143 |
This issue is called out in one of the commits in PR #117
The related code in filterblock.py as of that PR is:
We need to investigate this error more deeply to figure out the best fix
The text was updated successfully, but these errors were encountered: