support fused_back_pass for prodigy-plus-schedule-free #1867

michP247 · 2025-01-06T05:22:42Z

Copied the internals from https://github.com/LoganBooker/prodigy-plus-schedule-free into kohya library/prodigy_plus_schedulefree.py and made training scripts support either prodigyplus or fused adafactor when setting FBP

From my short tests dreambooth training w/ args
--fused_backward_pass --optimizer_type="prodigyplus.ProdigyPlusScheduleFree"

sd3.5medium 512x512 res w/ --full_bf16
base prodigy = 27.2 gb vram usage
prodigy-plus-schedule-free = 15.4 gb
prodigy-plus-schedule-free w/ FBP = 10.2 gb

sdxl 1024x1024 w/ --full_bf16
base prodigy = 33 gb
prodigy-plus-schedule-free = 19 gb
prodigy-plus-schedule-free w/ FBP = 13 gb

didn't test flux but should be similar gains

FurkanGozukara · 2025-01-06T09:21:02Z

wow nice

@michP247 you find this better than other optimizers?

michP247 · 2025-01-06T13:01:47Z

wow nice

@michP247 you find this better than other optimizers?

will check results later, I've haven't actually completed any training in my tests, just did a quick vram check last night lol (edited to mention I was using full bf16). Still need to figure out the correct --prodigy_steps value.

kohya-ss · 2025-01-06T13:11:27Z

Thanks for this pull request!

But I think it may work with the --optimizer_type and --optimizer_args options, like --optimizer_type "prodigyplus.ProdigyPlusScheduleFree" --optimizer_args "fused_back_pass=True" without any additional implementation. Have you tried this?

michP247 · 2025-01-06T13:53:02Z

Thanks for this pull request!

But I think it may work with the --optimizer_type and --optimizer_args options, like --optimizer_type "prodigyplus.ProdigyPlusScheduleFree" --optimizer_args "fused_back_pass=True" without any additional implementation. Have you tried this?

Hm Ok so I just tried and it does already work as an optimizer arg which I overlooked. But at least now it won't break anything if --fused_backward_pass is passed regularly.

michP247 · 2025-01-06T13:54:30Z

Thanks for this pull request!
But I think it may work with the --optimizer_type and --optimizer_args options, like --optimizer_type "prodigyplus.ProdigyPlusScheduleFree" --optimizer_args "fused_back_pass=True" without any additional implementation. Have you tried this?

Hm Ok so I just tried and it does already work as an optimizer arg which I overlooked. But at least now it won't break anything if --fused_backward_pass is passed regularly.

Which will be good for bmaltais gui since we'd simply be able to use the FBP checkbox with this optimizer

Exist-c · 2025-01-06T19:54:52Z

update：I apologize for the late reply. The issue has been fixed in ProdigyPlusScheduleFree v1.8.3. Thanks to @LoganBooker for his work. I tested the v1.8.4 and it works fine now, no longer needing the modifications from my commit.

Previous comment the issue is about register_post_accumulate_grad_hook and groups_to_process.

I attempted to add fused backward pass to train_network.py my changes: sd3...Exist-c:sd-scripts:sd3

Based on the implementation in sdxl_train.py and my tests in train_network.py , I think that the optimizer's step_param should be registered to the parameters, similar to Adafactor. Otherwise, optimizer will do nothing.
I'm not certain if Flux or SD3.5 necessitate this, but I thought it would be helpful to mention it.
Here is my implementation in train_network.py.

               # accelerator has wrapped the optimizer
               # we need optimizer.optimizer to access the original function.   
                for param_group in  optimizer.optimizer.param_groups:
                    for parameter in param_group["params"]:
                        if parameter.requires_grad:
                            def __grad_hook(tensor: torch.Tensor, param_group=param_group):
                                if accelerator.sync_gradients and args.max_grad_norm != 0.0:
                                    accelerator.clip_grad_norm_(tensor, args.max_grad_norm)
                                optimizer.optimizer.step_param(tensor, param_group)
                                tensor.grad = None  # clear grad to save memory
                            parameter.register_post_accumulate_grad_hook(__grad_hook)

And in my implementation, if both the text_encoder and unet are traning, parameters of the next step would be prematurely called using step_param , leading to errors. I made some modifications in on_end_step() ,but I think it changes optimizer's behavior, it is not the correct solution.

def patch_on_end_step(optimizer,group):
    group_index = optimizer.optimizer.param_groups.index(group)
    
    # my patch I think it's wrong,
    if group_index not in optimizer.optimizer.groups_to_process:
        return False
            
    # Decrement params processed so far.
    optimizer.optimizer.groups_to_process[group_index] -= 1
    ...

I'm not good at English, and the above translations are all done by machine translation. I hope I haven't offended anyone.

LoganBooker · 2025-01-08T05:05:20Z

Update: As of this commit for Prodigy+SF, all that should be needed in this pull request is to alter the assert; it will then be sufficient to set args.fused_backward_pass=True to activate FBP -- the optimiser will take care of the rest. Note that like Adafactor, Kohya only supports FBP for full finetuning (as far as I'm aware).

Previous comment follows.

Hello all, and thanks for your interest in the optimiser. I made a best-effort attempt to match how Kohya had implemented fused backward pass for Adafactor, in the hope it would be fairly straightforward to add support. Seems it's a bit more involved!

I've had a closer look at the SD3 branch and decided it would be easier perhaps to monkey patch the Adafactor patching method. This has been done in my most recent commit (LoganBooker/prodigy-plus-schedule-free@93339d8). Note I haven't created a new package for this just yet, so you'll need to use the code/repo directly to get this change.

What this means is that for this pull request, all you should need to do is tweak the hard-coded assert in train_util.py to allow Prodigy+SF as well. That's it (apart from installing/importing/selecting the optimiser itself).

sd-scripts/library/train_util.py

Lines 4633 to 4636 in e896539

    
           if args.fused_backward_pass: 
        
               assert ( 
        
                   optimizer_type == "Adafactor".lower() 
        
               ), "fused_backward_pass currently only works with optimizer_type Adafactor / fused_backward_passは現在optimizer_type Adafactorでのみ機能します"

Once that's done, you should be able to use the fused backward pass by passing fused_backward_pass=True to the optimiser, and setting args.fused_backward_pass=True to Kohya. Alternatively, you could retain the change that appends it to the optimiser arguments.

michP247 · 2025-01-08T20:28:18Z

Hello all, and thanks for your interest in the optimiser. I made a best-effort attempt to match how Kohya had implemented fused backward pass for Adafactor, in the hope it would be fairly straightforward to add support. Seems it's a bit more involved!

I've had a closer look at the SD3 branch and decided it would be easier perhaps to monkey patch the Adafactor patching method. This has been done in my most recent commit (LoganBooker/prodigy-plus-schedule-free@93339d8). Note I haven't created a new package for this just yet, so you'll need to use the code/repo directly to get this change.

What this means is that for this pull request, all you should need to do is tweak the hard-coded assert in train_util.py to allow Prodigy+SF as well. That's it (apart from installing/importing/selecting the optimiser itself).

sd-scripts/library/train_util.py

Lines 4633 to 4636 in e896539

if args.fused_backward_pass:

assert (

optimizer_type == "Adafactor".lower()

), "fused_backward_pass currently only works with optimizer_type Adafactor / fused_backward_passは現在optimizer_type Adafactorでのみ機能します"

Once that's done, you should be able to use the fused backward pass by passing fused_backward_pass=True to the optimiser, and setting args.fused_backward_pass=True to Kohya. Alternatively, you could retain the change that appends it to the optimiser arguments.

someone should take a look at this and the suggestion from Exist-c. I won't be able to update this PR for a while while I fix some PC troubles

michP247 added 2 commits January 5, 2025 23:26

support fused_back_pass for prodigy-plus-schedule-free

8cee727

fix

d0eba37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support fused_back_pass for prodigy-plus-schedule-free #1867

support fused_back_pass for prodigy-plus-schedule-free #1867

michP247 commented Jan 6, 2025 •

edited

Loading

FurkanGozukara commented Jan 6, 2025

michP247 commented Jan 6, 2025 •

edited

Loading

kohya-ss commented Jan 6, 2025

michP247 commented Jan 6, 2025

michP247 commented Jan 6, 2025

Exist-c commented Jan 6, 2025 •

edited

Loading

LoganBooker commented Jan 8, 2025 •

edited

Loading

michP247 commented Jan 8, 2025

support fused_back_pass for prodigy-plus-schedule-free #1867

Are you sure you want to change the base?

support fused_back_pass for prodigy-plus-schedule-free #1867

Conversation

michP247 commented Jan 6, 2025 • edited Loading

FurkanGozukara commented Jan 6, 2025

michP247 commented Jan 6, 2025 • edited Loading

kohya-ss commented Jan 6, 2025

michP247 commented Jan 6, 2025

michP247 commented Jan 6, 2025

Exist-c commented Jan 6, 2025 • edited Loading

LoganBooker commented Jan 8, 2025 • edited Loading

michP247 commented Jan 8, 2025

michP247 commented Jan 6, 2025 •

edited

Loading

michP247 commented Jan 6, 2025 •

edited

Loading

Exist-c commented Jan 6, 2025 •

edited

Loading

LoganBooker commented Jan 8, 2025 •

edited

Loading