Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk decompression of text columns #6458

Merged
merged 1 commit into from
Jan 31, 2024
Merged

Conversation

akuzm
Copy link
Member

@akuzm akuzm commented Dec 21, 2023

Implement bulk decompression for text columns. This will allow us to use them in the vectorized computation pipeline.

No changes in performance: https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=3093&var-run2=3078&var-threshold=0&var-use_historical_thresholds=true&var-threshold_expression=2.5%20*%20percentile_cont(0.90)&var-exact_suite_version=false

Disable-check: force-changelog-file

Copy link

codecov bot commented Dec 21, 2023

Codecov Report

Attention: 44 lines in your changes are missing coverage. Please review.

Comparison is base (85b27b4) 79.79% compared to head (7985c76) 79.78%.

Files Patch % Lines
tsl/src/compression/array.c 76.27% 0 Missing and 14 partials ⚠️
tsl/src/compression/dictionary.c 76.36% 0 Missing and 13 partials ⚠️
tsl/test/src/decompress_text_test_impl.c 71.42% 4 Missing and 6 partials ⚠️
tsl/src/nodes/decompress_chunk/compressed_batch.c 86.27% 3 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6458      +/-   ##
==========================================
- Coverage   79.79%   79.78%   -0.01%     
==========================================
  Files         190      190              
  Lines       36949    37101     +152     
  Branches     9356     9403      +47     
==========================================
+ Hits        29482    29600     +118     
- Misses       3116     3127      +11     
- Partials     4351     4374      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@akuzm akuzm mentioned this pull request Dec 21, 2023
7 tasks
@akuzm akuzm marked this pull request as ready for review January 8, 2024 11:50
Copy link

github-actions bot commented Jan 8, 2024

@fabriziomello, @antekresic: please review this pull request.

Powered by pull-review

Copy link
Member

@svenklemm svenklemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have test files/functions in dedicated direcotry?

@antekresic antekresic added this to the TimescaleDB 2.14 milestone Jan 29, 2024
Copy link
Contributor

@antekresic antekresic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why the PR is light on sql tests? I see a bunch of fuzzing tests were added but that's mostly it.

@akuzm
Copy link
Member Author

akuzm commented Jan 30, 2024

Is there a reason why the PR is light on sql tests? I see a bunch of fuzzing tests were added but that's mostly it.

Since the bulk decompression is enabled by default, it is covered by many existing tests that use text columns. I plan to add some SQL tests when we have vectorized filters on text columns: https://github.com/timescale/timescaledb/pull/6189/files#diff-0068ab49636b29bd1b20ebdd4f87f04251dae1a461f499500179100eab2c2494

@akuzm
Copy link
Member Author

akuzm commented Jan 30, 2024

Can we have test files/functions in dedicated direcotry?

Moved them to tsl/test/src. I also tried to make a proper CMake flag for fuzzer support, that would also find the libraries and configure the compiler, but I can't make the CMake work. That directory is a separate library, but it's not actually linked as library but is somehow concatenated to tsl library using TARGET_OBJECTS... So for now the compiler flags and library paths are still specified manually in the libfuzzer workflow.

Implement bulk decompression for text columns. This will allow us to use
them in the vectorized computation pipeline.
@akuzm akuzm force-pushed the decompression-only branch from f770fd3 to 7985c76 Compare January 31, 2024 12:44
@akuzm akuzm enabled auto-merge (rebase) January 31, 2024 12:44
@akuzm akuzm merged commit bf20e5f into timescale:main Jan 31, 2024
51 of 53 checks passed
@akuzm akuzm deleted the decompression-only branch January 31, 2024 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants