Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support fused_back_pass for prodigy-plus-schedule-free #1867

Open
wants to merge 2 commits into
base: sd3
Choose a base branch
from

Conversation

michP247
Copy link

@michP247 michP247 commented Jan 6, 2025

Copied the internals from https://github.com/LoganBooker/prodigy-plus-schedule-free into kohya library/prodigy_plus_schedulefree.py and made training scripts support either prodigyplus or fused adafactor when setting FBP

From my short tests dreambooth training w/ args
--fused_backward_pass --optimizer_type="prodigyplus.ProdigyPlusScheduleFree"

sd3.5medium 512x512 res w/ --full_bf16
base prodigy = 27.2 gb vram usage
prodigy-plus-schedule-free = 15.4 gb
prodigy-plus-schedule-free w/ FBP = 10.2 gb

sdxl 1024x1024 w/ --full_bf16
base prodigy = 33 gb
prodigy-plus-schedule-free = 19 gb
prodigy-plus-schedule-free w/ FBP = 13 gb

didn't test flux but should be similar gains

@FurkanGozukara
Copy link

wow nice

@michP247 you find this better than other optimizers?

@michP247
Copy link
Author

michP247 commented Jan 6, 2025

wow nice

@michP247 you find this better than other optimizers?

will check results later, I've haven't actually completed any training in my tests, just did a quick vram check last night lol (edited to mention I was using full bf16). Still need to figure out the correct --prodigy_steps value.

@kohya-ss
Copy link
Owner

kohya-ss commented Jan 6, 2025

Thanks for this pull request!

But I think it may work with the --optimizer_type and --optimizer_args options, like --optimizer_type "prodigyplus.ProdigyPlusScheduleFree" --optimizer_args "fused_back_pass=True" without any additional implementation. Have you tried this?

@michP247
Copy link
Author

michP247 commented Jan 6, 2025

Thanks for this pull request!

But I think it may work with the --optimizer_type and --optimizer_args options, like --optimizer_type "prodigyplus.ProdigyPlusScheduleFree" --optimizer_args "fused_back_pass=True" without any additional implementation. Have you tried this?

Hm Ok so I just tried and it does already work as an optimizer arg which I overlooked. But at least now it won't break anything if --fused_backward_pass is passed regularly.

@michP247
Copy link
Author

michP247 commented Jan 6, 2025

Thanks for this pull request!
But I think it may work with the --optimizer_type and --optimizer_args options, like --optimizer_type "prodigyplus.ProdigyPlusScheduleFree" --optimizer_args "fused_back_pass=True" without any additional implementation. Have you tried this?

Hm Ok so I just tried and it does already work as an optimizer arg which I overlooked. But at least now it won't break anything if --fused_backward_pass is passed regularly.

Which will be good for bmaltais gui since we'd simply be able to use the FBP checkbox with this optimizer

@Exist-c
Copy link

Exist-c commented Jan 6, 2025

update:I apologize for the late reply. The issue has been fixed in ProdigyPlusScheduleFree v1.8.3. Thanks to @LoganBooker for his work. I tested the v1.8.4 and it works fine now, no longer needing the modifications from my commit.


Previous comment the issue is about register_post_accumulate_grad_hook and groups_to_process.

I attempted to add fused backward pass to train_network.py my changes: sd3...Exist-c:sd-scripts:sd3

Based on the implementation in sdxl_train.py and my tests in train_network.py , I think that the optimizer's step_param should be registered to the parameters, similar to Adafactor. Otherwise, optimizer will do nothing.
I'm not certain if Flux or SD3.5 necessitate this, but I thought it would be helpful to mention it.
Here is my implementation in train_network.py.

               # accelerator has wrapped the optimizer
               # we need optimizer.optimizer to access the original function.   
                for param_group in  optimizer.optimizer.param_groups:
                    for parameter in param_group["params"]:
                        if parameter.requires_grad:
                            def __grad_hook(tensor: torch.Tensor, param_group=param_group):
                                if accelerator.sync_gradients and args.max_grad_norm != 0.0:
                                    accelerator.clip_grad_norm_(tensor, args.max_grad_norm)
                                optimizer.optimizer.step_param(tensor, param_group)
                                tensor.grad = None  # clear grad to save memory
                            parameter.register_post_accumulate_grad_hook(__grad_hook)

And in my implementation, if both the text_encoder and unet are traning, parameters of the next step would be prematurely called using step_param , leading to errors. I made some modifications in on_end_step() ,but I think it changes optimizer's behavior, it is not the correct solution.

def patch_on_end_step(optimizer,group):
    group_index = optimizer.optimizer.param_groups.index(group)
    
    # my patch I think it's wrong,
    if group_index not in optimizer.optimizer.groups_to_process:
        return False
            
    # Decrement params processed so far.
    optimizer.optimizer.groups_to_process[group_index] -= 1
    ...

I'm not good at English, and the above translations are all done by machine translation. I hope I haven't offended anyone.

@LoganBooker
Copy link

LoganBooker commented Jan 8, 2025

Update: As of this commit for Prodigy+SF, all that should be needed in this pull request is to alter the assert; it will then be sufficient to set args.fused_backward_pass=True to activate FBP -- the optimiser will take care of the rest. Note that like Adafactor, Kohya only supports FBP for full finetuning (as far as I'm aware).

Previous comment follows.


Hello all, and thanks for your interest in the optimiser. I made a best-effort attempt to match how Kohya had implemented fused backward pass for Adafactor, in the hope it would be fairly straightforward to add support. Seems it's a bit more involved!

I've had a closer look at the SD3 branch and decided it would be easier perhaps to monkey patch the Adafactor patching method. This has been done in my most recent commit (LoganBooker/prodigy-plus-schedule-free@93339d8). Note I haven't created a new package for this just yet, so you'll need to use the code/repo directly to get this change.

What this means is that for this pull request, all you should need to do is tweak the hard-coded assert in train_util.py to allow Prodigy+SF as well. That's it (apart from installing/importing/selecting the optimiser itself).

if args.fused_backward_pass:
assert (
optimizer_type == "Adafactor".lower()
), "fused_backward_pass currently only works with optimizer_type Adafactor / fused_backward_passは現在optimizer_type Adafactorでのみ機能します"

Once that's done, you should be able to use the fused backward pass by passing fused_backward_pass=True to the optimiser, and setting args.fused_backward_pass=True to Kohya. Alternatively, you could retain the change that appends it to the optimiser arguments.

@michP247
Copy link
Author

michP247 commented Jan 8, 2025

Hello all, and thanks for your interest in the optimiser. I made a best-effort attempt to match how Kohya had implemented fused backward pass for Adafactor, in the hope it would be fairly straightforward to add support. Seems it's a bit more involved!

I've had a closer look at the SD3 branch and decided it would be easier perhaps to monkey patch the Adafactor patching method. This has been done in my most recent commit (LoganBooker/prodigy-plus-schedule-free@93339d8). Note I haven't created a new package for this just yet, so you'll need to use the code/repo directly to get this change.

What this means is that for this pull request, all you should need to do is tweak the hard-coded assert in train_util.py to allow Prodigy+SF as well. That's it (apart from installing/importing/selecting the optimiser itself).

if args.fused_backward_pass:
assert (
optimizer_type == "Adafactor".lower()
), "fused_backward_pass currently only works with optimizer_type Adafactor / fused_backward_passは現在optimizer_type Adafactorでのみ機能します"

Once that's done, you should be able to use the fused backward pass by passing fused_backward_pass=True to the optimiser, and setting args.fused_backward_pass=True to Kohya. Alternatively, you could retain the change that appends it to the optimiser arguments.

someone should take a look at this and the suggestion from Exist-c. I won't be able to update this PR for a while while I fix some PC troubles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants