Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: add compaction offstrategy option #9095

Closed

Conversation

bhalevy
Copy link
Member

@bhalevy bhalevy commented Jul 26, 2021

Trigger offstrategy compaction via the REST API with a offstrategy option.

This is useful for triggering offstrategy compaction
post repair, after repairing all token ranges.

Otherwise, offstrategy compaction will only be
auto-triggered after a 5 minutes idle timeout.

Like major compaction, if offstrategy compaction
is requried the api call waits on its future
and returns its status.

Test: unit(dev)
DTest: repair_additional_test:RepairAdditionalTest.{repair_option_pr_{multi_dc,dc_host}_test,repair_option_cf_test}

Signed-off-by: Benny Halevy [email protected]

@bhalevy bhalevy force-pushed the trigger_offstrategy_compaction branch from a8f1d89 to 6e0e974 Compare August 1, 2021 11:04
@bhalevy bhalevy changed the title api: add keyspace_trigger_offstrategy_compaction api: add compaction offstrategy option Aug 1, 2021
@bhalevy bhalevy requested a review from raphaelsc August 1, 2021 11:17
@bhalevy bhalevy force-pushed the trigger_offstrategy_compaction branch from 6e0e974 to 15dcd18 Compare August 5, 2021 10:28
@bhalevy
Copy link
Member Author

bhalevy commented Aug 5, 2021

In v3:

  • keep void compaction_manager::submit_offstrategy()
  • expose future<> compaction_manager::do_submit_offstrategy()
    • self-contained, checked if offstrategy compaction needs triggering, otherwise returns a ready future.
    • use that from api call via future<> table::submit_offstrategy_compaction()

@bhalevy bhalevy requested a review from raphaelsc August 5, 2021 10:35
Copy link
Member

@raphaelsc raphaelsc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bhalevy
Copy link
Member Author

bhalevy commented Aug 5, 2021

@amnonh if you have no objection, @scylladb/scylla-maint let's merge this change, since the following chain of PRs depend on it: scylladb/scylla-jmx#172 -> scylladb/scylla-tools-java#269 -> scylladb/scylla-dtest#2237.

@amnonh
Copy link
Contributor

amnonh commented Aug 5, 2021

LGTM

Copy link
Contributor

@nyh nyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good, but I left a couple of comments for your consideration.

@bhalevy bhalevy force-pushed the trigger_offstrategy_compaction branch 2 times, most recently from a47b7cb to c4f7d29 Compare August 8, 2021 07:54
@bhalevy
Copy link
Member Author

bhalevy commented Aug 8, 2021

In v4:

  • compaction_manager::do_submit_offstrategy -> perform_offstrategy
  • table::submit_offstrategy_compaction -> perform_offstrategy_compaction
  • do not checkout if _off_strategy_trigger is armed, always call table::perform_offstrategy_compaction.
  • table: offstrategy compaction: add minimal documentation to api methods

@bhalevy bhalevy requested review from nyh and raphaelsc August 8, 2021 07:56
Copy link
Contributor

@nyh nyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me, but I left some small comments/requests behind.

@@ -623,7 +623,7 @@ void compaction_manager::submit(column_family* cf) {
});
}

void compaction_manager::submit_offstrategy(column_family* cf) {
future<> compaction_manager::perform_offstrategy(column_family* cf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message still refers to the old names.

},
{
"name":"offstrategy",
"description":"When set to true, trigger offstrategy compaction (and return immediately)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't provide a better explanation of what this "offstrategy compaction" means, we need to create documentation about it - Google didn't find any. As a user I would have no idea what exactly will happen when I "trigger offstrategy compaction" or why or when I would want to do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #9162 as a follow-up.

database.hh Outdated
future<> perform_offstrategy_compaction();
// Run offstrategy compaction.
// Typically called only internally by the compaction manager or unit tests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation doesn't explain to me how or why run_offstrategy_compaction() and perform_offstrategy_compaction() differ - they have the same signature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should consider moving run_offstrategy_compaction to be private and befriend its callers.

That said, I talked with Raphael already about defining a formal abstraction of the table/compaction api so that compaction-manager and compaction would be provided with an object through which they would call back into the table (or test) code, rather than calling them directly on the table. This way compaction will operate on an abstract container of sstables that could be a table object, but it could also be a TWCS bucket, for example, or just a bunch of sstables in a temporary directory for testing.

bhalevy added 3 commits August 9, 2021 09:38
Return a future from perform_offstrategy, resolved
when the offstrategy compaction completes so that callers
can wait on it.

submit_offstrategy still submits the offstrategy compaction
in the background.

Signed-off-by: Benny Halevy <[email protected]>
So it would be easier to relate the messages to the
table for which it was submitted.

Signed-off-by: Benny Halevy <[email protected]>
Perform offstrategy compaction via the REST API with
a `offstrategy` option.

This is useful for performing offstrategy compaction
post repair, after repairing all token ranges.

Otherwise, offstrategy compaction will only be
auto-triggered after a 5 minutes idle timeout.

Like major compaction, the api call returns the offstrategy
compaction task future, so it's waited on.

Signed-off-by: Benny Halevy <[email protected]>
bhalevy added 2 commits August 9, 2021 10:45
And rename it to do_run_offstrategy_compaction in the process.
It is meant to be called back from the compaction manager.

"Users" of the table API should now call
`perform_offstrategy_compaction`.

Signed-off-by: Benny Halevy <[email protected]>
It is hard to tell which of
{trigger,perform,run}_offstrategy_compaction
does what.

Signed-off-by: Benny Halevy <[email protected]>
@bhalevy bhalevy force-pushed the trigger_offstrategy_compaction branch from c4f7d29 to 123e407 Compare August 9, 2021 08:02
@bhalevy
Copy link
Member Author

bhalevy commented Aug 9, 2021

In v5:

  • fixed commitlog message re: compaction_manager::perform_offstrategy
  • made table::run_offstrategy_compaction private
    • and renamed to do_run_offstrategy_compaction

@avikivity
Copy link
Member

What's the difference between offstrategy and major compaction?

If the difference between the current table shape and the strategy goal is large, then offstrategy will perform something similar to major compaction, no?

And if it's small, then the compaction strategy will work to reduce it.

I'm sure there is some difference, but does it merit a new API?

@bhalevy
Copy link
Member Author

bhalevy commented Aug 9, 2021

What's the difference between offstrategy and major compaction?

If the difference between the current table shape and the strategy goal is large, then offstrategy will perform something similar to major compaction, no?

And if it's small, then the compaction strategy will work to reduce it.

I'm sure there is some difference, but does it merit a new API?

Major compaction's goal is different than offstrategy compaction.
The former's goal, e.g. with STCS, is to compact all data into a single sstable,
while the latter's goal is to bring the sstables closer to the steady-state compaction-strategy invariant,
i.e. a state equivalent to after regular compaction (of all buckets).

@nyh
Copy link
Contributor

nyh commented Aug 9, 2021

Major compaction's goal is different than offstrategy compaction.
The former's goal, e.g. with STCS, is to compact all data into a single sstable,
while the latter's goal is to bring the sstables closer to the steady-state compaction-strategy invariant,
i.e. a state equivalent to after regular compaction (of all buckets).

So one might say that "off-strategy" compaction is a confusing name - it's doesn't do something "off-strategy" but quite the opposite - it tries to return the current off-strategy situation to be more like the strategy wants. Maybe it should be called "back-to-strategy compaction" :-) In any case, we clearly need documentation for this feature - the one-line string in the REST API explains nothing - but I saw you opened an issue for that.

@raphaelsc
Copy link
Member

raphaelsc commented Aug 9, 2021 via email

@nyh
Copy link
Contributor

nyh commented Aug 9, 2021

Actually it's called offstrategy because it's a compaction performed in a maintainance sstable set where sstables in there were created by maintenance operations (they aren't even considered by regular compaction therefore offstrategy).

Oh! This is quite different from what @bhalevy wrote above - are you saying that "off strategy compaction" doesn't even touch the "normal" sstables which major compaction works on - rather, it only works on some new sstables recently created by maintenance operations (which operations)? So if you didn't recently do any maintenance operations, it will not do anything at all?

@raphaelsc
Copy link
Member

raphaelsc commented Aug 9, 2021 via email

@bhalevy
Copy link
Member Author

bhalevy commented Aug 9, 2021

Actually it's called offstrategy because it's a compaction performed in a maintainance sstable set where sstables in there were created by maintenance operations (they aren't even considered by regular compaction therefore offstrategy).

Oh! This is quite different from what @bhalevy wrote above - are you saying that "off strategy compaction" doesn't even touch the "normal" sstables which major compaction works on - rather, it only works on some new sstables recently created by maintenance operations (which operations)? So if you didn't recently do any maintenance operations, it will not do anything at all?

Yeah, this is better put this way.

@bhalevy
Copy link
Member Author

bhalevy commented Aug 12, 2021

@nyh so what's the verdict? Are you ok with merging this change?

@bhalevy
Copy link
Member Author

bhalevy commented Aug 12, 2021

@avikivity please ack. Are you ok with this change?

@avikivity
Copy link
Member

Actually it's called offstrategy because it's a compaction performed in a maintainance sstable set where sstables in there were created by maintenance operations (they aren't even considered by regular compaction therefore offstrategy).

Oh! This is quite different from what @bhalevy wrote above - are you saying that "off strategy compaction" doesn't even touch the "normal" sstables which major compaction works on - rather, it only works on some new sstables recently created by maintenance operations (which operations)? So if you didn't recently do any maintenance operations, it will not do anything at all?

Yeah, this is better put this way.

I don't understand then how we can be in a situation that requires the API call. If we did a maintenance operation, it will automatically initiate offstrategy compaction into the main directory.

@bhalevy
Copy link
Member Author

bhalevy commented Aug 12, 2021

I don't understand then how we can be in a situation that requires the API call. If we did a maintenance operation, it will automatically initiate offstrategy compaction into the main directory.

It's not a requirement, just an optimization.
With regular repair, we don't know exactly when the meta-operation is finished since it is managed externally, e.g. by scylla-manager. Therefore, the decision was that we arm a 5 minutes timer that will submit the offstrategy compaction, if scylla is idle and no new repair operation starts.

This means that in the normal case we have a 5 minutes delay after repair finishes and until offstrategy compaction starts.
5 minutes is typically shorter than the time to run repair, but still it's a pretty long window, so the idea was to use this api to perform the offstrategy compaction immediately, when the external scheduler (e.g. scylla-manager, or a test) is done with repair and offstrategy compaction is good to go.

@nyh
Copy link
Contributor

nyh commented Aug 12, 2021

I don't understand then how we can be in a situation that requires the API call. If we did a maintenance operation, it will automatically initiate offstrategy compaction into the main directory.

It's not a requirement, just an optimization.

This will need to be documented well - if it's at all useful... The first description I read in this patch made it sound like "off-strategy compaction" is a legitimate alternative to "major compaction", that just behaves a bit differently in a way I didn't understand. Now it has become clear that "off-strategy compaction" only useful after some unspecified "management operations" (which specific operations are those? repair? anything else?), but even that happens even without this command, so what this command apparently does is to expedite this compaction which will happen in 5 minutes anyway. None of this can be understood by the description of that command.

With regular repair, we don't know exactly when the meta-operation is finished since it is managed externally, e.g. by scylla-manager. Therefore, the decision was that we arm a 5 minutes timer that will submit the offstrategy compaction, if scylla is idle and no new repair operation starts.

Maybe the default shouldn't be 5 minutes - that users might find a long boring wait, and look for a command to expedite this. What if, instead, it was a 1 minute delay? 1 minute is more than enough for scylla-manager to start another repair if it wants, but it's not long enough for users to go looking for a command to cancel that delay.

P.S. it's not that I think that there's anything "wrong" with this new API option, it's just that wouldn't it be better if it didn't have to exist, and users wouldn't even need to learn what it means?

@avikivity
Copy link
Member

I don't understand then how we can be in a situation that requires the API call. If we did a maintenance operation, it will automatically initiate offstrategy compaction into the main directory.

It's not a requirement, just an optimization.
With regular repair, we don't know exactly when the meta-operation is finished since it is managed externally, e.g. by scylla-manager. Therefore, the decision was that we arm a 5 minutes timer that will submit the offstrategy compaction, if scylla is idle and no new repair operation starts.

This means that in the normal case we have a 5 minutes delay after repair finishes and until offstrategy compaction starts.
5 minutes is typically shorter than the time to run repair, but still it's a pretty long window, so the idea was to use this api to perform the offstrategy compaction immediately, when the external scheduler (e.g. scylla-manager, or a test) is done with repair and offstrategy compaction is good to go.

So, this API call is to save five minutes?

@bhalevy
Copy link
Member Author

bhalevy commented Aug 15, 2021

I don't understand then how we can be in a situation that requires the API call. If we did a maintenance operation, it will automatically initiate offstrategy compaction into the main directory.

It's not a requirement, just an optimization.

This will need to be documented well - if it's at all useful... The first description I read in this patch made it sound like "off-strategy compaction" is a legitimate alternative to "major compaction", that just behaves a bit differently in a way I didn't understand. Now it has become clear that "off-strategy compaction" only useful after some unspecified "management operations" (which specific operations are those? repair? anything else?), but even that happens even without this command, so what this command apparently does is to expedite this compaction which will happen in 5 minutes anyway. None of this can be understood by the description of that command.

With regular repair, we don't know exactly when the meta-operation is finished since it is managed externally, e.g. by scylla-manager. Therefore, the decision was that we arm a 5 minutes timer that will submit the offstrategy compaction, if scylla is idle and no new repair operation starts.

Maybe the default shouldn't be 5 minutes - that users might find a long boring wait, and look for a command to expedite this. What if, instead, it was a 1 minute delay? 1 minute is more than enough for scylla-manager to start another repair if it wants, but it's not long enough for users to go looking for a command to cancel that delay.

P.S. it's not that I think that there's anything "wrong" with this new API option, it's just that wouldn't it be better if it didn't have to exist, and users wouldn't even need to learn what it means?

@nyh First, I think that offstrategy compaction can be triggered during repair when there are enough sstables already accumulated. By default we limit each compaction job to 32 sstables anyhow so there's no point in accumulating more than that per shard. Then, regarding 1 minute vs. 5 minutes, @asias and @raphaelsc what were these 5 minutes based on?

@raphaelsc
Copy link
Member

I don't understand then how we can be in a situation that requires the API call. If we did a maintenance operation, it will automatically initiate offstrategy compaction into the main directory.

It's not a requirement, just an optimization.

This will need to be documented well - if it's at all useful... The first description I read in this patch made it sound like "off-strategy compaction" is a legitimate alternative to "major compaction", that just behaves a bit differently in a way I didn't understand. Now it has become clear that "off-strategy compaction" only useful after some unspecified "management operations" (which specific operations are those? repair? anything else?), but even that happens even without this command, so what this command apparently does is to expedite this compaction which will happen in 5 minutes anyway. None of this can be understood by the description of that command.

With regular repair, we don't know exactly when the meta-operation is finished since it is managed externally, e.g. by scylla-manager. Therefore, the decision was that we arm a 5 minutes timer that will submit the offstrategy compaction, if scylla is idle and no new repair operation starts.

Maybe the default shouldn't be 5 minutes - that users might find a long boring wait, and look for a command to expedite this. What if, instead, it was a 1 minute delay? 1 minute is more than enough for scylla-manager to start another repair if it wants, but it's not long enough for users to go looking for a command to cancel that delay.
P.S. it's not that I think that there's anything "wrong" with this new API option, it's just that wouldn't it be better if it didn't have to exist, and users wouldn't even need to learn what it means?

@nyh First, I think that offstrategy compaction can be triggered during repair when there are enough sstables already accumulated. By default we limit each compaction job to 32 sstables anyhow so there's no point in accumulating more than that per shard. Then, regarding 1 minute vs. 5 minutes, @asias and @raphaelsc what were these 5 minutes based on?

@asias picked this. It's not a definitive value, but it's a reasonable one to begin with. Small values may lead to offstrategy running too early and that leads to inefficiency. Even if offstrategy takes a couple of minutes to kick off, that's ok because data from non overlapping sstables are being efficiently served from maintenance set. It's better to delay offstrategy than running it too early.

@bhalevy
Copy link
Member Author

bhalevy commented Jan 27, 2022

Closing in favor of #9980

@bhalevy bhalevy closed this Jan 27, 2022
denesb added a commit that referenced this pull request Feb 2, 2022
This series adds methods to perform offstrategy compaction, if needed, returning a future<bool>
so the caller can wait on it until compaction completes.
The returned value is true iff offstrategy compaction was needed.

The added keyspace_offstrategy_compaction calls perform_offstrategy_compaction on the specified keyspace and tables, return the number of tables that required offstrategy compaction.

A respective unit test was added to the rest_api pytest.

This PR replaces #9095 that suggested adding an option to `keyspace_compaction`
since offstrategy compaction triggering logic is different enough from major compaction meriting a new api.

Test: unit (dev)

Closes #9980

* github.com:scylladb/scylla:
  test: rest_api: add unit tests for keyspace_offstrategy_compaction api
  api: add keyspace_offstrategy_compaction
  compaction_manager: get rid of submit_offstrategy
  table: add perform_offstrategy_compaction
  compaction_manager: perform_offstrategy: print ks.cf in log messages
  compaction_manager: allow waiting on offstrategy compaction
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants