-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stable diffusion 3.5 support. #2578
Conversation
nice work cleaning up and integrating thank you. |
Mostly by looking at the python reference implementation here: https://github.com/Stability-AI/sd3.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into sd3.5/mmdit-x.py in comparison with sd3-ref/mmdit.py. Looks like so far this PR is missing some changes to MMDiT blocks, including
- an extra attention for the MMDiT-X block is placed here.
- modified pre_attention for x_block (code here)
- modified post-attention for x_block (code here)
- and different block-joining (between x and context) is present here.
I noticed you already finished adapting the projection layer. Looks like this PR is still WIP or maybe changes in some files are missing in the pushing? Let me know if I got it right and if/how I can help.
Interestingly, I am still able to reproduce generating beautiful images, though. I'm not sure if my guess is right, but perhaps the original SD3.0's MMDiT could simply benefit from stacking more layers without having to change the architecture.
#[arg(long, default_value_t = 4.0)] | ||
cfg_scale: f64, | ||
#[arg(long)] | ||
cfg_scale: Option<f64>, | ||
|
||
// Time shift factor (alpha). | ||
#[arg(long, default_value_t = 3.0)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed SAI's reference implementation of sd3.5 seems to change the recommended time_shift for sd3.0-medium from 3.0 to 1.0, but it's for dpmpp_2m sampler (code here). This wasn't the case in their original reference impl for sd3.0-medium, which is always 3.0 but for euler sampling (code here).
Not sure if it's worth to change the default behavior here. The difference should be small, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, as both the scheduler and the parameter changed I preferred keeping everything as they were before and not modify the existing behavior.
{ | ||
let vb_vae = vb_fp16.rename_f(sd3_vae_vb_rename).pp("first_stage_model"); | ||
let autoencoder = build_sd3_vae_autoencoder(vb_vae)?; | ||
|
||
// Apply TAESD3 scale factor. Seems to be significantly improving the quality of the image. | ||
// https://github.com/comfyanonymous/ComfyUI/blob/3c60ecd7a83da43d694e26a77ca6b93106891251/nodes.py#L721-L723 | ||
autoencoder.decode(&((x / 1.5305)? + 0.0609)?)? | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know the VAE hasn't changed so this part could share across sd3.0 and sd3.5. Maybe it's worth to wrap those lines into a small function to avoid duplication, in case we need to change or tweak this part in the future. We can still keep the call site into a curly brace scope to control the lifecycle of VAE model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I've pushed some changes to avoid a bit of the duplication (so the vae application and creation as well as the text model/mmdit application)
Thanks for looking into this, note that the new attention mechanism that you mentioned is I think triggered based an the existence of the |
@LaurentMazare Got it! I didn't expect that. Seems the new archetecture is for SD3.5-medium only. Yes I agree we should wait until SD3.5-medium comes out, which should be around the corner on Oct 29. I think I should have the bandwidth to do the support and testing over the next weekend. Let me know if you want me to be on it. This PR LGTM. Thanks for the prompt support! |
Thanks, yeah let's have a look on the 29th and if it needs the new cross attention it would be great if you can add it. |
The medium model is out now https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/tree/main and it seems to have some attn2 layers so I guess we will need to support it properly, would be great if you have the time to look at it. |
@LaurentMazare Done in #2587 |
* Stable diffusion 3.5 support. * Clippy fixes. * CFG fix. * Remove some unnecessary clones. * Avoid duplicating some of the code.
Image generated with 3.5 turbo.