-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(arrow-ord): remove duplicate logic and make it easier to add tests #6913
base: main
Are you sure you want to change the base?
Conversation
Thank you for these PRs @rluvaton -- I know we are seriously behind on PR review in arrow-rs. I hope that we will have more capacity after next week. We just need to find more people willing to help out with reviews |
I'm willing to help, although currently my codebase knowledge is not the greatest but I think it's getting better |
Thank you 🙏 -- the most helpful thing would be to review outstanding PRs and ensure the description is understandable, the code is commented and covered with tests, and you think the code does what the description says If you find such PRs, feel free to @ mention me and I'll try and give it a look asap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea; I've left some suggestions to see if we can simplify the code even further.
fn test_sort_arrays< | ||
ListItemType: Clone, | ||
IntoListFn: Fn(Vec<Option<ListItemType>>) -> Vec<ArrayRef>, | ||
>( | ||
into_lists_fn: IntoListFn, | ||
data: Vec<Option<ListItemType>>, | ||
options: Option<SortOptions>, | ||
limit: Option<usize>, | ||
expected_data: Vec<Option<ListItemType>>, | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn test_sort_arrays< | |
ListItemType: Clone, | |
IntoListFn: Fn(Vec<Option<ListItemType>>) -> Vec<ArrayRef>, | |
>( | |
into_lists_fn: IntoListFn, | |
data: Vec<Option<ListItemType>>, | |
options: Option<SortOptions>, | |
limit: Option<usize>, | |
expected_data: Vec<Option<ListItemType>>, | |
) { | |
fn test_sort_arrays<ListItemType, IntoListFn>( | |
into_lists_fn: IntoListFn, | |
data: Vec<Option<ListItemType>>, | |
options: Option<SortOptions>, | |
limit: Option<usize>, | |
expected_data: Vec<Option<ListItemType>>, | |
) where | |
ListItemType: Clone, | |
IntoListFn: Fn(Vec<Option<ListItemType>>) -> Vec<ArrayRef>, | |
{ |
Thoughts on using where bound here to make it a bit more readable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, applied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was missed from the changes you applied
There's some CI errors to look into. Beyond that, I'm confused by some of the reasoning provided in the body:
I find it unlikely that Of course if a trait based approach is more verbose/complex than the current function based approach I can understand why this approach was chosen. |
Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look |
Rationale for this change
While working on #6911 and I started to write tests, I found out I just duplicated the logic, and it was messier, so cleaning up to make it easier to create new tests.
Background: all sort arrays test helper were given 2 vectors with native rust types (e.g.
Vec<Option<bool>>
), one for the input and the other for the expected output.and the sort test helper convert it to the input and expected arrays to Arrow arrays (e.g.
BooleanArray
), and then callsort
(orsort_limit
depend on the input) and assert that the expected match the sorted.this is done to help make the test simpler by just providing what the actual arrow array represent without messing with converting to arrow type.
the problem is that some Rust types can have multiple Arrow types, for example
Vec<Option<Vec<u8>>>
can beBinaryArray
orLargeBinaryArray
orFixedSizeBinaryArray
orBinaryViewArray
(same for lists and string).in that case each sort helper function convert the input data to some of the arrow type (like
FixedSizeBinaryArray
) or to multiple arrow type (BinaryArray
andLargeBinaryArray
)and then do the same as described above.
for the cases that
fixed_length
was provided it will testFixedSizeBinaryArray
The problem with this is that there are a lot of duplication and adding tests to new type is just adding a lot of boilerplate instead of just starting to write code
What changes are included in this PR?
So instead of this approach I created a single
test_sort_array
function that is given theinput
,expected
and a function to convert that rust input type (e.g.Vec<Option<Vec<u8>>>
like explained above) to all variants from that type (e.g.BinaryArray
andLargeBinaryArray
andFixedSizeBinaryArray
andBinaryViewArray
)then created for each test type a new helper to convert that rust type to arrow type
for creating binary array I returned
FixedSizeBinaryArray
if all element have the same size, this will no allow to avoid only testing fixed or variable length types.and moved all the build array helpers to sub module for more organize code.
But why do I pass a function to build the array and not using Rust trait system? this will surely be cleaner as you can have the actual function implementation hidden behind the input type and it would make the test cleaner.
You (basically my other self) are right. The problem is that I tried and it can't work without implementing for each primitive type manually (I think), because let's see the following example:
Let's say I have a trait called
ConvertToArray
and then I implement it for primitive type to create primitive array:
all good and it works. the problem is I can't implement it for
bool
when I want to test sortingBooleanArray
becauseArrowPrimitiveType
can be implemented in the future forbool
and this would cause duplicate implementation. and for that reason Rust disallow thatAre there any user-facing changes?
nope, only tests