Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Support Basic Diffusers MoE (e.g. SDXL Refiner) #17

Open
iwr-redmond opened this issue Dec 14, 2024 · 0 comments
Open

Comments

@iwr-redmond
Copy link

iwr-redmond commented Dec 14, 2024

I reckon I must have been raised in a barn, because I was unaware that like Fooocus Diffusers can support transferring latents between two checkpoints at a certain point in the generation.

Max Woolf demonstrates this in passing while describing his SDXL negative Lora, describing it as a "Mixture of Experts":

high_noise_frac = 0.8 # equivalent to the "Refiner Switch At" in Fooocus

image = base(
    prompt=prompt,
    negative_prompt=negative_prompt,
    denoising_end=high_noise_frac,
    output_type="latent",
).images

image = refiner(
    prompt=prompt,
    negative_prompt=negative_prompt,
    denoising_start=high_noise_frac,
    image=image,
).images[0]

Notes:

  1. Whether this implementation "can reuse the base model's momentum" (per lllyasviel's comment here) is not immediately clear.

  2. Doubling the model memory required will pose some VRAM issues, but if I understand correctly both Fooocus and SD.Next manage low-VRAM problems by loading everything onto CPU and only using the GPU for the specific portions of the model required at each generation step. That might be overkill here, as merely loading the two checkpoints onto the CPU and then moving them sequentially over to the GPU could be close enough. That would be equivalent to diffusers' model_cpu_offload option, rather than the sequential_cpu_offload that minimizes as much memory as possible at the expense of speed.

  3. Unlike Fooocus, this would not allow for latents to be interposed between SDXL (primary) and an SD1.5 checkpoint (refiner/secondary), although I reckon it would allow for an SD15 inpainting checkpoint to be used as a refiner. I believe a multi-architecture feature would need the Latent Interposer code and checkpoints, which would be a much larger effort outside the scope of this feature request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant