Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a resilver_txg_start for faster resilvering of multiple disks #16774

Open
Haravikk opened this issue Nov 17, 2024 · 11 comments
Open

Add a resilver_txg_start for faster resilvering of multiple disks #16774

Haravikk opened this issue Nov 17, 2024 · 11 comments
Labels
Type: Feature Feature request or new feature

Comments

@Haravikk
Copy link

Haravikk commented Nov 17, 2024

Describe the feature would like to see added to OpenZFS

When a disk is added to a pool that already has a resilver in progress, the current txg for the resilver should also be noted for the new drive as a new resilver_txg_start (or similarly named) value, so that the resulting deferred resilver can end early for that disk.

Basically, when the deferred resilver runs, once it hits the transaction identified by resilver_txg_start it can stop resilvering to that drive, as it should be up to date on all later transactions (has already resilvered these).

If all disks being resilvered have reached their resilver_txg_start then the resilver can either end early, rather than continuing all the way to the end for a second time, or the behaviour for disks that are resilvered could effectively become a scrub (instead of writing the data, compare to what's on disk already first). The latter option arguably isn't necessary, but a second check shouldn't really hurt, the goal is to reduce unneccessary writing of data that is already present.

For disks added to a pool that is not already resilvering there should be no difference – they will resilver as normal, and if a deferred resilver is triggered by a second disk, the first disk should be skipped (as it will lack a resilver_txg_start value to resilver up to).

How will this feature improve OpenZFS?

Currently when a disk is added to a pool for resilvering (usually via zpool attach or zpool replace) it is given a resilver_txg value (seen via zdb) which tracks its resilvering progress, so that earlier transactions can be skipped as already resilvered. This is also how drives that are temporarily unavailable (offline'd, lost connection etc.) can be resilvered by only copying changes since they went missing, as resilver_txg is set to the last txg they received.

However, when a drive is added to a pool that is already resilvering, there appears to be no such tracking of transactions it has missed, as a result, the deferred resilver that adding another drive triggers will resilver the entire drive as if it had never been a part of the pool, which is a massive waste of time (plus additional wear on the disk). It also seems to sometimes resilver a drive that wasn't added during an existing resilver, e.g- if you add two new disks to the pool, one at a time, the first will be resilvered twice in its entirety, while the second will be partially resilvered (from the current resilver_txg), then fully resilvered again.

This is also an issue when a disk is added then detached, leaving a stalled resilver – I had this happen recently when discovering (to my dismay) another disk I didn't realise was SMR that I was adding as a replacement. I detached the disk, but this left a stalled resilver, then added a CMR replacement instead, but that replacement has been resilvered twice (once partially, then again from the beginning). While I could have forced the resilver to restart using zpool resilver, I realised this too late and it seems like a very unintuitive and unnecessary thing to do (since ZFS should know which part of the resilvering was missed).

Adding this additional case for resilvering should cover all cases that can be optimised (to avoid resilvering data that doesn't need to be) as a disk should either be outdated (new transaction since it was last seen) or new (missed earlier transactions).

@Haravikk Haravikk added the Type: Feature Feature request or new feature label Nov 17, 2024
@Haravikk Haravikk changed the title Add a resilver_txt_start for faster resilvering of multiple disks Add a resilver_txg_start for faster resilvering of multiple disks Nov 21, 2024
@snajpa
Copy link
Contributor

snajpa commented Nov 27, 2024

Can you please describe your intendend use-case in a single sentence? I can't seem to be able to extract it from your description.

Doesn't this solve any of your concerns? #15810

@snajpa
Copy link
Contributor

snajpa commented Nov 27, 2024

Also, ugh, 25 open feature requests? Are you planning on helping with any of them or are you just adding more work for others? I don't like this. Just reading and extracting the ask from your very verbal descriptions seems like a lot of work.

@AllKind
Copy link
Contributor

AllKind commented Nov 27, 2024

I think it's even more than 25, I checked it too.
However my personal opinion would be to keep the issue section strictly for bugs/problems and have people put the feature requests in the idea section of Discussions: https://github.com/openzfs/zfs/discussions/categories/ideas
Out of the 1390 issues 476 are feature requests. Given the reality of things it's highly unlikely even a small percentage will ever be implemented. So they just linger around and "pollute" the issue tracker.
Also it would be cleaner to have all issues at one place and all feature requests in another.

@Haravikk
Copy link
Author

Haravikk commented Nov 27, 2024

Can you please describe your intendend use-case in a single sentence? I can't seem to be able to extract it from your description.

Don't resilver the entire disk during deferred resilvers?

Doesn't this solve any of your concerns? #15810

Kind of but also not really – that proposal is essentially just the same as running zpool resilver pool after adding a disk that will be deferred, but neither solves the core problem which is that the resilvering doesn't track at what point in the resilver the deferred disk was added (so the deferred resilver resilvers everything from the beginning).

For example, if a resilver is already in progress and is 25% complete, and a second disk is added, ZFS will resilver that second disk from 25% to 100%, but the deferred resilver will then do the same again from 0% to 100%, rather than just doing 0% to 25% (the missing section).

While restarting the resilver for low progress will help reduce the wasted time in common use cases (adding a disk to a mirror, then another) it isn't really solving the actual problem which is that time/writes are being wasted, it's just reducing it.

Also, ugh, 25 open feature requests? Are you planning on helping with any of them or are you just adding more work for others? I don't like this. Just reading and extracting the ask from your very verbal descriptions seems like a lot of work.

I'm so sorry for suggesting improvements to ZFS in good faith using the feature request option of this project's issue tracker, I'm just using the tool put in front of me to suggest improvements that I believe would be useful to myself and other ZFS users – wasn't expecting to be attacked for that.

@snajpa
Copy link
Contributor

snajpa commented Nov 27, 2024

For example, if a resilver is already in progress and is 25% complete, and a second disk is added, ZFS will resilver that second disk from 25% to 100%, but the deferred resilver will then do the same again from 0% to 100%, rather than just doing 0% to 25% (the missing section).

There's a tunable you can change dynamically.

wasn't expecting to be attacked for that

Would you be so kind and show me where have you been attacked? I asked whether you're planning on helping with any of those, I can't do that, it's automatically an attack? Are you aware of the unnecessary congnitive load that comes along with your feature requests, or is that an attack in your mind too?

My position on this is that I could propose 10 new features every day. What good is it going to be if it's way behind what the project can handle? Trivial changes are trivial, but... If you consider this an attack, well, there's nothing I can do about that, I'm just excercising my right to express an opinion. Feature requests alone aren't as useful as you paint them to be. In my mind, if you'd lend a helping hand to the project also in other ways, I'd be looking how to help you achieve that functionality, but this way, I'm rather looking for ways how to reduce future maintenance load, as keeping the codebase bug-free is becoming increasingly difficult.

@Haravikk
Copy link
Author

There's a tunable you can change dynamically.

Which doesn't do what I'm describing in this issue. As I've already said, the problem isn't that the resilver doesn't restart, it's that the deferred silver resilvers the entire disk when it doesn't need to.

If deferred resilvers were able to resilver only the missing portion (up to the point where the deferred disk was added) then restarting the resilver wouldn't be necessary at all – it would simply run to completion, then the deferred resilver would finish the remaining part (and only the remaining part).

Would you be so kind and show me where have you been attacked?

You dedicated an entire comment (now two) to criticising me for what – having posted multiple issues, over several years? This has precisely zero relevance to the issue – either the feature request has an issue you want to discuss, or it doesn't.

Again, I'm so sorry for being a user of ZFS and for having thoughts about what might make it better – I'm really genuinely sorry that you disapprove of this project having a section for feature requests and I've used it, but if you feel so strongly about that my issue is not the place to be discussing it.

I had a feature request, so I posted a feature request.

@snajpa
Copy link
Contributor

snajpa commented Nov 27, 2024

Again, I'm so sorry for being a user of ZFS and for having thoughts about what might make it better – I'm really genuinely sorry that you disapprove of this project having a section for feature requests and I've used it, but if you feel so strongly about that my issue is not the place to be discussing it.

Given your behavior I'm now starting to think there should be a quota on feature requests, because beyond a certain point, it's just spam. Endless stream of feature requests is counterproductive. Oh, I just discovered Github has a block function, neat. Meh, but it doesn't hide the spam from you. Context for others: read through the feature requests, particularly this has been a tipping point for me #16800 (comment) - IMHO your contribution is on par with the guy you're criticizing there, you see, not everyone shares your view that feature requests alone are that useful.

@AllKind
Copy link
Contributor

AllKind commented Nov 27, 2024

Maybe putting your feature requests into the ideas section of the Discussions tab would be a fair deal?
The whole purpose of that section is just to do that.
You think that's doable for you?

Not every criticism is automatically an attack. Our egos just love to interpret it that way, which then leads us down to a path fighting with each other.

You give your requests a lot of thought and that's something to appreciate.
I just think it would be better to put them into the ideas section. It's relatively new and there is no moderator around to "enforce" its usage. This still is an open source community project, with all the problems many of them face - too little manpower.
Putting the requests into the issue section may cause the reaction you have seen. Its felt as pressure.
I hope that's understandable.
Peace out :-)

@snajpa
Copy link
Contributor

snajpa commented Nov 27, 2024

thanks @AllKind

I just wanted to add that I'd be willing to help you out however you need @Haravikk if you'd like to take a shot at implementing any one of those. I don't want to block useful stuff from coming into existence, I just don't like it's currently - seemingly to me at least - like a time-sink without a bottom... but this isn't personal, it's not really directed at your person at all... at this point, it's a survival reflex, this codebase really needs more love than more hacks, you know :))

@Haravikk
Copy link
Author

Haravikk commented Nov 27, 2024

Maybe putting your feature requests into the ideas section of the Discussions tab would be a fair deal?

If whoever's maintaining the github project wants to relocate feature requests somewhere else and retire that function then I'll post wherever the new location is, but as far as I'm concerned feature requests is where feature requests are currently supposed to be.

I'm well aware open source projects don't have infinite development resources, and if I could I'd contribute to as many as I could. I'm under no allusions that every feature I've posted will be implemented, or implemented quickly. But I'm an active ZFS user, and I encounter cases that could be improved.

Maybe I'll find the time to work on it myself (currently unlikely, and resilvering is the last thing I'd want to be cutting my teeth on), maybe someone else will, but if the feature request isn't out there how is anyone supposed to know about or discuss it? It's not a sign of failure for open source projects to have a list of possible improvements.

But I don't really want to keep discussing it here – it's not relevant to the feature request, that's a separate issue for how the github project is structured and used.

Either we want to handle deferred resilvers better or not – if other options are considered "good enough for now" that doesn't bother me, but since ZFS already tracks resilver_txg for added disks, recording this for disks that will be deferred seems like a good way to allow the root problem to be resolved, which is that work gets repeated either by restarting the resilver (repeats sections of the non-deferred disk) or is deferred (resilvers the deferred disk twice, once partially, then once again fully, when only part is needed).

@AllKind
Copy link
Contributor

AllKind commented Nov 27, 2024

I can't speak for Behlendorf or any other member with admin access.
Just out of my own head and logic: This is the issue tracker. Feature request is just a label out of many. On the other hand there is the discussion section, which has an Ideas section just for the purpose of proposing new features.
For me that sounds logic to use that. Others also do it. I tried my argument, obviously I couldn't convince you :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

3 participants