-
Notifications
You must be signed in to change notification settings - Fork 256
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: use hyperloglog for cardinality estimation for dictionary encod…
…ing (#2555) Currently, to determine whether dictionary encoding should be applied, we use a `HashSet` for accurate cardinality calculation. However, I believe that perfect accuracy in cardinality isn't necessary in this context. Therefore, we could use HyperLogLog for a rough cardinality estimation, which might save memory and potentially speed up the cardinality check. * `HyperLogLog` uses a fixed size of memory, determined by the precision in the code. And the `HashSet` uses the `threshold * each_item_string_size` of memory, if certain items are large, `HashSet` may use non trivial amount of memory * `HyperLogLog` has an error rate (1.56%, translated from precision 12), while `HashSet` is accurate
- Loading branch information
Showing
3 changed files
with
82 additions
and
32 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters