Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: do not collect histograms for non-indexed JSON columns #139766

Merged
merged 1 commit into from
Jan 27, 2025

Conversation

mgartner
Copy link
Collaborator

Informs #139381

Release note (sql change): Since v23.2 table statistics histograms have
been collected for non-indexed JSON columns. Histograms are no longer
collected for these columns. This reduces memory usage during table
statistics collection, for both automatic and manual collection via
ANALYZE and CREATE STATISTICS. This can be reverted by setting the
cluster setting sql.stats.non_indexed_json_histograms.enabled to
true.

@mgartner mgartner added backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2 backport-24.3.x Flags PRs that need to be backported to 24.3 backport-25.1.x Flags PRs that need to be backported to 25.1 labels Jan 24, 2025
@mgartner mgartner requested review from a team January 24, 2025 18:22
@mgartner mgartner requested a review from a team as a code owner January 24, 2025 18:22
@mgartner mgartner requested review from DrewKimball and removed request for a team January 24, 2025 18:22
Copy link

blathers-crl bot commented Jan 24, 2025

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@mgartner
Copy link
Collaborator Author

I'll backport this with the setting on by default so that the change is opt-in for past releases.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

I wonder whether we should backport the change with the cluster setting set to false. The fact that we started collecting 2-bucket histograms on JSON in 23.2 was a consequence of supporting forward indexes on JSON, and I don't think it was intentional that we started collecting the histograms. In other words, there was a behavior change because of this, and we have already seen support tickets because of it. Anecdotally, we also occasionally see tickets / questions for why the memory usage has increased noticeably after a major upgrade (most of those were probably attributed to GOGC change though). On the other hand, it seems unlikely that having 2-bucket histograms on non-indexed JSON columns helps with query planning (I don't recall any tickets around this on pre-23.2 versions), so it seems ok to revert to the original behavior in a patch release.

That said, I understand that we have more leeway with behavior changes in major releases, so the more conservative approach is to backport with true like you're suggesting. I just wonder whether the benefits of having the setting default to false on backports - for customers and for our support load - outweigh the risk of having a behavior change in a patch.


Also, should we add a quick logic test for this?

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)


pkg/sql/create_stats.go line 393 at r1 (raw file):

// are collected for those columns as well.
//
// If nonIndexedJsonHistograms is true, 2-bucket histograms are collected for

nit: s/nonIndexed/nonIndex/.

@mgartner mgartner force-pushed the 139381-1-no-json-sampling branch from 81106a2 to c9596c5 Compare January 24, 2025 19:14
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should we add a quick logic test for this?

Done.

I wonder whether we should backport the change with the cluster setting set to false.

I agree—I don't think removing these 2-bucket histograms will affect planning. But there is a non-zero chance this may cause some inadvertent change. So I'm on the fence. There is a middle ground where we backport for now with the default as true (no change in behavior) and then at a later point when we have some more confidence that false is working for other workloads we change the default to false in another backport.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @yuzefovich)


pkg/sql/create_stats.go line 393 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/nonIndexed/nonIndex/.

Done.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. :lgtm:

Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)

@mgartner mgartner force-pushed the 139381-1-no-json-sampling branch from c9596c5 to 0319992 Compare January 24, 2025 20:48
Informs cockroachdb#139381

Release note (sql change): Since v23.2 table statistics histograms have
been collected for non-indexed JSON columns. Histograms are no longer
collected for these columns. This reduces memory usage during table
statistics collection, for both automatic and manual collection via
`ANALYZE` and `CREATE STATISTICS`. This can be reverted by setting the
cluster setting `sql.stats.non_indexed_json_histograms.enabled` to
`true`.
@mgartner mgartner force-pushed the 139381-1-no-json-sampling branch from 0319992 to 6de4846 Compare January 24, 2025 22:35
@mgartner mgartner requested a review from a team as a code owner January 24, 2025 22:35
@mgartner
Copy link
Collaborator Author

TFTR!

bors r+

@craig craig bot merged commit eb0fc95 into cockroachdb:master Jan 27, 2025
22 checks passed
Copy link

blathers-crl bot commented Jan 27, 2025

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 6de4846 to blathers/backport-release-23.2-139766: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.2.x failed. See errors above.


error creating merge commit from 6de4846 to blathers/backport-release-24.1-139766: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 24.1.x failed. See errors above.


error creating merge commit from 6de4846 to blathers/backport-release-24.2-139766: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 24.2.x failed. See errors above.


error setting reviewers, but backport branch blathers/backport-release-24.3-139766 is ready: POST https://api.github.com/repos/cockroachdb/cockroach/pulls/139897/requested_reviewers: 422 Reviews may only be requested from collaborators. One or more of the teams you specified is not a collaborator of the cockroachdb/cockroach repository. []

Backport to branch 24.3.x failed. See errors above.


error setting reviewers, but backport branch blathers/backport-release-25.1-139766 is ready: POST https://api.github.com/repos/cockroachdb/cockroach/pulls/139898/requested_reviewers: 422 Reviews may only be requested from collaborators. One or more of the teams you specified is not a collaborator of the cockroachdb/cockroach repository. []

Backport to branch 25.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@mgartner
Copy link
Collaborator Author

blathers backport 24.3.5-rc

Copy link

blathers-crl bot commented Jan 31, 2025

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error getting backport branch release-24.3.5-rc: unexpected status code: 404 Not Found

Backport to branch 24.3.5-rc failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2 backport-24.3.x Flags PRs that need to be backported to 24.3 backport-25.1.x Flags PRs that need to be backported to 25.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants