Skip to content

Luo-Yihong/YOSO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs

This is the Official Repository of "You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs", by Yihong Luo, Xiaolong Chen, Xinghua Qu, Tianyang Hu, Jing Tang. overview

🔥News

  • (2025/01/23) YOSO is accepted by ICLR 2025 🎉
  • (2024/10/21) We update a new version of the technical report. In particular, we re-train the YOSO-LoRA via more computational resources and better data, achieving better one-step performance. Check the technical report for more details! The newly trained LoRA may be released in the next few months.
  • (2024/05/28) Training code of YOSO-LoRA is released!!!
  • (2024/06/04) We update YOSO-PixArt-α-512 with better performance compared to previous released checkpoint. Enjoy it!

Pre-trained Models

This is an early version of our pre-trained models, it is expected to be updated soon.

Note that YOSO-PixArt-α-512 is trained on JourneyDB with 512 resolution. YOSO-PixArt-α-1024 is obtained by directly merging YOSO-PixArt-α-512 with PixArt-XL-2-1024-MS, without extra explicit training on 1024 resolution.

Training

Prepare data

To train the YOSO-LoRA, we use the caption of JourneyDB to generate training data.

You should Download the json file for prepare at least.

Run the script for training

You can run the training script as follows:

accelerate launch  --num_processes=4 --mixed_precision=fp16  train_yoso_lora.py --pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5 --use_ema --train_batch_size=16  --gradient_accumulation_steps=4  --gradient_checkpointing --max_train_steps=5000 --learning_rate=2e-05 --max_grad_norm=1 --enable_xformers_memory_efficient_attention

The training script is largely adopted from train_text_to_image.py by diffusers. Thanks for their impressive work!

Note that the script performs latent perceptual loss for consistency loss, while mse for kl loss. This is for saving computational resource.

You can perform consistency loss by MSE for better efficiency, or perform latent perceptual loss for kl loss also for better performance.

Moreover, the SD-Turbo is used for generation, you can replace by SDXL-Turbo or real data for better performance;

Usage

We take YOSO-sd1.5-lora as an example. It is highly recommended to utilize YOSO-sd1.5-lora in conjunction with realistic-vision-v51 to produce impressive samples by 2 steps.

import torch
from diffusers import DiffusionPipeline, LCMScheduler
pipeline = DiffusionPipeline.from_pretrained("stablediffusionapi/realistic-vision-v51", torch_dtype = torch.float16)
pipeline = pipeline.to('cuda')
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
generator = torch.manual_seed(318)
steps = 2
imgs= pipeline(prompt="A photo of a man, XT3",
                    num_inference_steps=steps, 
                    num_images_per_prompt = 1,
                        generator = generator,
                        guidance_scale=1.5,
                   )[0]
imgs[0]

man

Moreover, it is observed that when combined with new base models, our YOSO-LoRA is able to use some advanced ode-solvers:

import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
pipeline = DiffusionPipeline.from_pretrained("stablediffusionapi/realistic-vision-v51", torch_dtype = torch.float16)
pipeline = pipeline.to('cuda')
pipeline.scheduler = DPMSolverMultistepScheduler.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
generator = torch.manual_seed(323)
steps = 2
imgs= pipeline(prompt="A photo of a girl, XT3",
                    num_inference_steps=steps, 
                    num_images_per_prompt = 1,
                        generator = generator,
                        guidance_scale=1.5,
                   )[0]
imgs[0]

girl

We encourage you to experiment with various solvers to obtain better samples. We will try to improve the compatibility of the YOSO-LoRA with different solvers.

Contact

Please contact Yihong Luo ([email protected]) if you have any questions about this work.

Bibtex

@misc{luo2024sample,
      title={You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs}, 
      author={Yihong Luo and Xiaolong Chen and Xinghua Qu and Jing Tang},
      year={2024},
      eprint={2403.12931},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages