Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support KeyRange in --clusters_to_watch flag #17604

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

GuptaManan100
Copy link
Member

@GuptaManan100 GuptaManan100 commented Jan 22, 2025

Description

This PR addresses the feature request #17537.
Now clusters_to_watch flag accepts key ranges. The format for input is still the same, and therefore backward compatible.
Internally Vitess now treats the input as key ranges instead of explicit shard names.

This allows the users to not restart VTOrc in case of a reshard. For example, if a VTOrc is configured to watch ks/-80, then it would watch all the shards that fall under the KeyRange -80. If a reshard is run and, -80 is split into new shards -40, and 40-80, the VTOrc instance will automatically start watching the new shard without needing a restart.

As part of this change, I have restructured the shardsToWatch map to store key ranges instead of strings. I've also made it so that we don't need to update the shards information by insteading storing in the map that we are watching all the shards using a complete key range.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

Copy link
Contributor

vitess-bot bot commented Jan 22, 2025

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Jan 22, 2025
@github-actions github-actions bot added this to the v22.0.0 milestone Jan 22, 2025
@GuptaManan100 GuptaManan100 removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Jan 22, 2025
Comment on lines -329 to -334
// Refresh shards to watch.
eg.Go(func() error {
updateShardsToWatch()
return nil
})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this anymore, because we don't store the explicit shard names for the keyspaces that we want to watch in the entirety. Instead we just store a complete key range and that doesn't change during the course of running of a VTOrc instance.

Copy link

codecov bot commented Jan 22, 2025

Codecov Report

Attention: Patch coverage is 0% with 48 lines in your changes missing coverage. Please review.

Project coverage is 67.62%. Comparing base (d1aa2f4) to head (b51bcc7).
Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
go/vt/vtorc/logic/tablet_discovery.go 0.00% 40 Missing ⚠️
go/vt/key/key.go 0.00% 5 Missing ⚠️
go/vt/vtorc/logic/keyspace_shard_discovery.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17604      +/-   ##
==========================================
- Coverage   67.68%   67.62%   -0.07%     
==========================================
  Files        1586     1586              
  Lines      255170   255627     +457     
==========================================
+ Hits       172722   172877     +155     
- Misses      82448    82750     +302     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@timvaillancourt timvaillancourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @GuptaManan100 🚀

@@ -133,6 +134,11 @@ This is the last time this will be needed in the `8.0.x` series, as starting wit

The base system now uses Debian Bookworm instead of Debian Bullseye for the `vitess/lite` images. This change was brought by [Pull Request #17552].

### <a id="key-range-vtorc"/>KeyRanges in `--clusters_to_watch` in VTOrc</a>
VTOrc now supports specifying KeyRanges in the `--clusters_to_watch` flag. This is useful in scenarios where you don't need to restart a VTOrc instance if you run a reshard.
For example, if a VTOrc is configured to watch `ks/-80`, then it would watch all the shards that fall under the KeyRange `-80`. If a reshard is run and, `-80` is split into new shards `-40`, and `40-80`, the VTOrc instance will automatically start watching the new shard without needing a restart.
Copy link
Contributor

@timvaillancourt timvaillancourt Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GuptaManan100 I think this explains the "new" behaviour very well, but I feel like a user of the old way won't fully understand the risks

For example, a user with --clusters_to_watch foo/-80 in the previous logic will watch 1 (or 0) shards only. After this change, with the same flag value they would watch any shard from - to 80, which could be surprising. So I guess I'm saying explaining the difference between the old logic and this PR might make this risk more clear

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should I add to make it more clear? Should I add a line something like -
In the previous logic, watching a shard -80 would watch 1 (or 0) shard only. In the new system, since we interpret -80 as a key range, it can lead to a watch on multiple shards as described in the example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GuptaManan100 that extra detail is great 👍

Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I just have the one noted concern. Let me know if I'm missing or misunderstanding something. Either way, I'll come back to this quickly and we can get it merged. ❤️

go/vt/vtorc/logic/tablet_discovery.go Outdated Show resolved Hide resolved
go/vt/vtorc/logic/tablet_discovery.go Outdated Show resolved Hide resolved
Signed-off-by: Manan Gupta <[email protected]>
@GuptaManan100
Copy link
Member Author

@mattlord I talked to @deepthi and she said that we only support 0 as a name of a shard other than range based shards. Everything else like 2 and wassup are names that we no longer support. They are legacy names that might be around, but we want to drop support for 0 too, and go to only range based shard names.

In this light, the way the function is implemented (and corresponding tests verify this), is that if a users gives us something like ks/2 to watch, we will watch the entire keyspace ks because 2 is not a range based shard. The only use case should be if a user has an unsharded keysapce ks and they specify ks/0 and this too would work fine, because we would watch the entire shard range and the unsharded shard will be part of it.

Do you think it's worth adding the logic to continue to support arbitrary shard names? I can do that too, by basically creating another map from keyspace to shard names (like we had before), in addition to what we have, and store the shards that aren't range based in them, and continue to run equality checks for them. I personally, don't think we should add support for constructs that we've deprecated (I've added a catch-all to still watch all the shards), but I don't have strong opinions.

@mattlord
Copy link
Contributor

@mattlord I talked to @deepthi and she said that we only support 0 as a name of a shard other than range based shards. Everything else like 2 and wassup are names that we no longer support. They are legacy names that might be around, but we want to drop support for 0 too, and go to only range based shard names.

That is exactly what the key package does. It supports 0 as a valid key range alias for -. So I'm saying that the code should do proper input validation to enforce this and return an error to the user if they specify an invalid key range (rather than accepting it and doing something unexpected).

In this light, the way the function is implemented (and corresponding tests verify this), is that if a users gives us something like ks/2 to watch, we will watch the entire keyspace ks because 2 is not a range based shard. The only use case should be if a user has an unsharded keysapce ks and they specify ks/0 and this too would work fine, because we would watch the entire shard range and the unsharded shard will be part of it.

But your usage of topo.ValidateShardName, which supports the use of arbitrary shard names, means that we accept them as input. Why would we do that rather than using key.IsValidKeyRange and key.ParseShardingSpec to do proper input validation? Again, those do support 0 as an alias for - if that was your concern.

Do you think it's worth adding the logic to continue to support arbitrary shard names? I can do that too, by basically creating another map from keyspace to shard names (like we had before), in addition to what we have, and store the shards that aren't range based in them, and continue to run equality checks for them. I personally, don't think we should add support for constructs that we've deprecated (I've added a catch-all to still watch all the shards), but I don't have strong opinions.

I'm not suggesting we support arbitrary shard names, I'm suggesting that we do not allow it. 🙂

I'm not suggesting that we support shard names, I'm saying that currently we do support them because the input validator that you're using — topo.ValidateShardName — supports them. And with unexpected behavior IMO (we treat all shard names as -). I'm saying that we should instead do input validation to enforce the key range requirement and return an error to the user if they specify shard values that are NOT valid key ranges (the keys package already supports 0 as an alias for - if that was your concern).

@mattlord I talked to @deepthi and she said that we only support 0 as a name of a shard other than range based shards. Everything else like 2 and wassup are names that we no longer support. They are legacy names that might be around, but we want to drop support for 0 too, and go to only range based shard names.

Yes, I know. I'm suggesting that the code should match your intentions here and do input validation and alert the user to this problematic usage (they can always just specify the keyspace name by itself).

In this light, the way the function is implemented (and corresponding tests verify this), is that if a users gives us something like ks/2 to watch, we will watch the entire keyspace ks because 2 is not a range based shard. The only use case should be if a user has an unsharded keysapce ks and they specify ks/0 and this too would work fine, because we would watch the entire shard range and the unsharded shard will be part of it.

IMO ks/2 as input should return an error.

Does this make sense? I don't feel too strongly about it, so I'm OK with you making the final call.

@mattlord
Copy link
Contributor

mattlord commented Jan 24, 2025

@GuptaManan100 why I think this discussion is worth having and why IMO we should do proper input validation is this... let's say that I have two shards: ks/-80 and ks/80- and I only want this vtorc to watch -80. But when settings things up I make a typo and use ks/80. vtorc will happily start up without saying anything about my input, and it will do something I do not want or expect: it will watch both shards.

@GuptaManan100
Copy link
Member Author

Yes, I see what you mean. You're right, I'll make the change ❤️

@GuptaManan100
Copy link
Member Author

I've made both the changes @mattlord! You can take a look again! Thank you!

Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @GuptaManan100 ! I like it 🙂

Comment on lines +84 to +89
return fmt.Errorf("Invalid key range %q while parsing clusters to watch", s)
}
// Parse the shard name into key range value.
keyRanges, err := key.ParseShardingSpec(s)
if err != nil {
return fmt.Errorf("Could not parse shard name %q: %+v", s, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but errors aren't supposed to use sentence capitalization or punctuation.

updateShardsToWatch()
err := initializeShardsToWatch()
if err != nil {
log.Fatalf("Error parsing clusters to watch: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but IMO it would be nicer to mention --clusters_to_watch to help the operator track down what they need to fix (could be in a config file e.g.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: VTorc Vitess Orchestrator integration Type: Feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: support ranges of shards in VTOrc --clusters_to_watch
3 participants