Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add table mappings to compression settings #6990

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

erimatnor
Copy link
Contributor

@erimatnor erimatnor commented Jun 3, 2024

Refactor the compression settings metadata table to include a mapping from a chunk's relid to its compressed chunk's relid.

This simplifies the code that looks up compression metadata as it no longer requires first looking up the corresponding compressed chunk in the "chunk" metadata table. Instead, given the non-compressed chunk's relid, the corresponding compressed chunk's relid can easily be looked up via the compression settings.

The refactoring is a step towards removing a chunk's compression table from the chunk metadata. In other words, the "compressed chunk" will no longer be a chunk, just a relation associated with the regular chunk via compression settings. However, this requires further refactoring that is left for follow-up changes.

Disable-check: force-changelog-file

@erimatnor erimatnor added compression tech-debt Needs refactoring and improvement tasks related to the source code and its architecture. labels Jun 3, 2024
@erimatnor erimatnor force-pushed the refactor-compression-settings branch 7 times, most recently from a48c167 to f2652ad Compare June 4, 2024 09:22
Copy link

codecov bot commented Jun 4, 2024

Codecov Report

Attention: Patch coverage is 98.26087% with 2 lines in your changes missing coverage. Please review.

Project coverage is 81.26%. Comparing base (59f50f2) to head (3bff242).
Report is 719 commits behind head on main.

Files with missing lines Patch % Lines
tsl/src/compression/compression_dml.c 94.44% 0 Missing and 1 partial ⚠️
tsl/src/compression/create.c 85.71% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6990      +/-   ##
==========================================
+ Coverage   80.06%   81.26%   +1.19%     
==========================================
  Files         190      240      +50     
  Lines       37181    44729    +7548     
  Branches     9450    11163    +1713     
==========================================
+ Hits        29770    36348    +6578     
- Misses       2997     3991     +994     
+ Partials     4414     4390      -24     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@erimatnor erimatnor force-pushed the refactor-compression-settings branch from f2652ad to b9df9ea Compare June 4, 2024 10:21
@erimatnor erimatnor marked this pull request as ready for review June 4, 2024 11:57
@erimatnor erimatnor requested review from svenklemm and mkindahl June 4, 2024 11:57
Copy link
Member

@svenklemm svenklemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this PR might need a bit more work. If we're going to store info about compressed and uncompressed assignments in the compression_settings table, we should take it out of the chunk catalog table in this same update. Storing the same thing in two places can lead to confusion and mistakes. We want to keep things tidy and have one source of truth for this information.

@erimatnor
Copy link
Contributor Author

It looks like this PR might need a bit more work. If we're going to store info about compressed and uncompressed assignments in the compression_settings table, we should take it out of the chunk catalog table in this same update. Storing the same thing in two places can lead to confusion and mistakes. We want to keep things tidy and have one source of truth for this information.

While I agree we should get rid of the metadata in the chunk catalog table, I think we should do this in increments. That's a pretty heavy lift and has other consequences on dependent tables, like compression_chunk_size, etc.

This change does not mean we will store the same information in multiple places apart from an extra ID. I think we can live with this in the short term until the other changes are merged too.

@erimatnor erimatnor requested a review from antekresic June 7, 2024 16:08
Copy link
Contributor

@antekresic antekresic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For posterity, lets make a concrete plan for the rest of the refactoring that should follow this PR for clarity and prioritization purposes.

Removing the compressed_chunk_id chunk metadata should be followed up and would clean up the catalog and our code nicely.

src/process_utility.c Outdated Show resolved Hide resolved
src/ts_catalog/compression_settings.c Show resolved Hide resolved
src/ts_catalog/compression_settings.c Show resolved Hide resolved
tsl/src/compression/compression.c Outdated Show resolved Hide resolved
@mkindahl mkindahl requested review from antekresic and svenklemm and removed request for mkindahl November 20, 2024 07:40
@erimatnor erimatnor force-pushed the refactor-compression-settings branch 3 times, most recently from ada1790 to dc8a969 Compare January 29, 2025 10:16
@erimatnor
Copy link
Contributor Author

For posterity, lets make a concrete plan for the rest of the refactoring that should follow this PR for clarity and prioritization purposes.

Here's the plan: https://github.com/timescale/eng-database/issues/618

Removing the compressed_chunk_id chunk metadata should be followed up and would clean up the catalog and our code nicely.

100%. Already working on this, but it is a much bigger change so probably requires multiple follow up PRs. Next step is to remove unnecessary fields in compression_chunk_size, for example.

@erimatnor
Copy link
Contributor Author

Storing the same thing in two places can lead to confusion and mistakes. We want to keep things tidy and have one source of truth for this information.

@svenklemm I agree with single source of truth. However, we already have places were we duplicate information (chunk Ids), like compression_chunk_size, for example.

The intention is to address this over multiple refactoring PRs, including getting rid of compressed chunks in the "chunk" table. But his is further down the line and this is just the first step. I do think it is important to make these changes incrementally. Doing everything in one PR will lead to massive changes and it will be very difficult to review.

@erimatnor erimatnor force-pushed the refactor-compression-settings branch 3 times, most recently from 051b5a4 to a0eff1d Compare January 29, 2025 17:01
@erimatnor erimatnor force-pushed the refactor-compression-settings branch 3 times, most recently from 892ea73 to 5ed75cf Compare January 30, 2025 08:14
Refactor the compression settings metadata table to include a mapping
from a chunk's relid to its compressed chunk's relid.

This simplifies the code that looks up compression metadata as it no
longer requires first looking up the corresponding compressed chunk in
the "chunk" metadata table. Instead, given the non-compressed chunk's
relid, the corresponding compressed chunk's relid can easily be looked
up via the compression settings.

The refactoring is a step towards removing a chunk's compression table
from the chunk metadata. In other words, the "compressed chunk" will
no longer be a chunk, just a relation associated with the regular
chunk via compression settings. However, this requires further
refactoring that is left for follow-up changes.
@erimatnor erimatnor force-pushed the refactor-compression-settings branch from 5ed75cf to 3bff242 Compare January 30, 2025 08:17
Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good step in the direction of avoiding unnecessary lookup of catalog data.

From some of the changes made here I see that we still have a lot of unnecessary indirection causing us to add extraneous updates of catalog data, e.g., attribute names, so we are far from done.

sql/pre_install/tables.sql Show resolved Hide resolved
src/chunk.c Show resolved Hide resolved
src/ts_catalog/compression_settings.c Show resolved Hide resolved
src/ts_catalog/compression_settings.c Show resolved Hide resolved
src/ts_catalog/compression_settings.c Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compression tech-debt Needs refactoring and improvement tasks related to the source code and its architecture.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants