-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support fused_back_pass for prodigy-plus-schedule-free #1867
base: sd3
Are you sure you want to change the base?
Conversation
wow nice @michP247 you find this better than other optimizers? |
will check results later, I've haven't actually completed any training in my tests, just did a quick vram check last night lol (edited to mention I was using full bf16). Still need to figure out the correct --prodigy_steps value. |
Thanks for this pull request! But I think it may work with the |
Hm Ok so I just tried and it does already work as an optimizer arg which I overlooked. But at least now it won't break anything if --fused_backward_pass is passed regularly. |
Which will be good for bmaltais gui since we'd simply be able to use the FBP checkbox with this optimizer |
update:I apologize for the late reply. The issue has been fixed in ProdigyPlusScheduleFree v1.8.3. Thanks to @LoganBooker for his work. I tested the v1.8.4 and it works fine now, no longer needing the modifications from my commit. Previous comment the issue is about register_post_accumulate_grad_hook and groups_to_process.I attempted to add fused backward pass to train_network.py my changes: sd3...Exist-c:sd-scripts:sd3 Based on the implementation in sdxl_train.py and my tests in train_network.py , I think that the optimizer's step_param should be registered to the parameters, similar to Adafactor. Otherwise, optimizer will do nothing.
And in my implementation, if both the text_encoder and unet are traning, parameters of the next step would be prematurely called using step_param , leading to errors. I made some modifications in on_end_step() ,but I think it changes optimizer's behavior, it is not the correct solution.
I'm not good at English, and the above translations are all done by machine translation. I hope I haven't offended anyone. |
Update: As of this commit for Prodigy+SF, all that should be needed in this pull request is to alter the assert; it will then be sufficient to set Previous comment follows. Hello all, and thanks for your interest in the optimiser. I made a best-effort attempt to match how Kohya had implemented fused backward pass for Adafactor, in the hope it would be fairly straightforward to add support. Seems it's a bit more involved! I've had a closer look at the SD3 branch and decided it would be easier perhaps to monkey patch the Adafactor patching method. This has been done in my most recent commit (LoganBooker/prodigy-plus-schedule-free@93339d8). Note I haven't created a new package for this just yet, so you'll need to use the code/repo directly to get this change. What this means is that for this pull request, all you should need to do is tweak the hard-coded assert in sd-scripts/library/train_util.py Lines 4633 to 4636 in e896539
Once that's done, you should be able to use the fused backward pass by passing |
someone should take a look at this and the suggestion from Exist-c. I won't be able to update this PR for a while while I fix some PC troubles |
Copied the internals from https://github.com/LoganBooker/prodigy-plus-schedule-free into kohya library/prodigy_plus_schedulefree.py and made training scripts support either prodigyplus or fused adafactor when setting FBP
From my short tests dreambooth training w/ args
--fused_backward_pass --optimizer_type="prodigyplus.ProdigyPlusScheduleFree"
sd3.5medium 512x512 res w/ --full_bf16
base prodigy = 27.2 gb vram usage
prodigy-plus-schedule-free = 15.4 gb
prodigy-plus-schedule-free w/ FBP = 10.2 gb
sdxl 1024x1024 w/ --full_bf16
base prodigy = 33 gb
prodigy-plus-schedule-free = 19 gb
prodigy-plus-schedule-free w/ FBP = 13 gb
didn't test flux but should be similar gains