Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle type conversion errors in FilterByValueBlock #78

Merged
merged 2 commits into from
Jul 6, 2024

Conversation

russellb
Copy link
Member

@russellb russellb commented Jul 4, 2024

97d3fcd filterblock: Don't break when filtering on unexpected data
111711f Add unit tests for sdg.filterblock

commit 97d3fcd
Author: Russell Bryant [email protected]
Date: Wed Jul 3 19:53:23 2024 -0400

filterblock: Don't break when filtering on unexpected data

The previous code in this block did filtering assuming that all
samples had a value that was correct for the type. For example, when
filtering on an integer value, it assumed every row had a valid
integer, where it may instead have garbage.

This change introduces a new helper, _convert_dtype(), which properly
handles this condition. When the conversion fails on a `ValueError`
exception, it treats it as `None` instead of allowing the exception to
be raised up to the caller.

The fix was authored by Shiv in PR #72. I only pulled it out into a
standalone commit.

Signed-off-by: Russell Bryant <[email protected]>
Co-authored-by: shiv <[email protected]>

commit 111711fa419b53e2f31b63bcb91f829eacd55548
Author: Russell Bryant [email protected]
Date: Wed Jul 3 19:56:45 2024 -0400

Add unit tests for sdg.filterblock

This new file introduces some unit tests for the filterblock module.
A previous commit introduced a bug fix to ensure that unexpected data
does not cause this block to break. These tests exercise those fixes.
The tests introduced here will run automatically in CI and help ensure
that the problem does not reoccur.

Signed-off-by: Russell Bryant <[email protected]>

The previous code in this block did filtering assuming that all
samples had a value that was correct for the type. For example, when
filtering on an integer value, it assumed every row had a valid
integer, where it may instead have garbage.

This change introduces a new helper, _convert_dtype(), which properly
handles this condition. When the conversion fails on a `ValueError`
exception, it treats it as `None` instead of allowing the exception to
be raised up to the caller.

The fix was authored by Shiv in PR instructlab#72. I only pulled it out into a
standalone commit.

Signed-off-by: Russell Bryant <[email protected]>
Co-authored-by: shiv <[email protected]>
@russellb russellb mentioned this pull request Jul 4, 2024
@mergify mergify bot added the testing Relates to testing label Jul 4, 2024
@russellb
Copy link
Member Author

russellb commented Jul 4, 2024

I posted this as part of a discussion thread here: #72 (comment)

tests/test_filterblock.py Outdated Show resolved Hide resolved
@markmc
Copy link
Contributor

markmc commented Jul 4, 2024

Great, very clear. Well done on adding tests. lgtm

This new file introduces some unit tests for the filterblock module.
A previous commit introduced a bug fix to ensure that unexpected data
does not cause this block to break. These tests exercise those fixes.
The tests introduced here will run automatically in CI and help ensure
that the problem does not reoccur.

Signed-off-by: Russell Bryant <[email protected]>
@russellb russellb force-pushed the pr-72-alternative branch from 111711f to 0de3f62 Compare July 6, 2024 17:37
@russellb
Copy link
Member Author

russellb commented Jul 6, 2024

CI is passing, it was approved previously as part of #72, and @markmc reviewed this version that split it out and added tests, so I'm going to merge

@russellb russellb merged commit 6b32b48 into instructlab:main Jul 6, 2024
11 checks passed
@russellb russellb added this to the 0.1.0 milestone Jul 8, 2024
jwm4 pushed a commit to jwm4/sdg that referenced this pull request Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Relates to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants