-
Notifications
You must be signed in to change notification settings - Fork 254
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: filter out null values when sampling for index training (#3404)
We were not filtering out null values when sampling. Because we often call `array.values()` on Arrow arrays, which ignores the null bitmap, we are often silently treating the nulls as zeros (or possibly undefined values). Only thing that caught these nulls is an assertion. However, residualization occurring with L2 and Cosine often meant that these values were transformed and null information was lost before the assertion, which is why it got past previous unit tests. This PR adds more assertions validating there aren't nulls, and makes sure the sampling code handles null vectors. Closes #3402 Closes #3400
- Loading branch information
Showing
5 changed files
with
282 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters