Fine-tuning Omnigen can better help you handle specific image generation tasks. For example, by fine-tuning on a person's images, you can generate multiple pictures of that person while maintaining task consistency.
A lot of previous work focused on designing new networks to facilitate specific tasks. For instance, ControlNet was proposed to handle image conditions, and IP-Adapter was constructed to maintain ID features. If you want to perform new tasks, you need to build new architectures and repeatedly debug them. Adding and adjusting extra network parameters is usually time-consuming and labor-intensive, which is not user-friendly and cost-efficient enough. However, with Omnigen, all of this becomes very simple.
By comparison, Omnigen can accept multi-modal conditional inputs and has been pre-trained on various tasks. You can fine-tune it on any task without designing specialized networks like ControlNet or IP-Adapter for a specific task.
All you need to do is prepare the data and start training. You can break the limitations of previous models, allowing Omnigen to accomplish a variety of interesting tasks, even those that have never been done before.
git clone https://github.com/VectorSpaceLab/OmniGen.git
cd OmniGen
pip install -e .
accelerate launch \
--num_processes=1 \
--use_fsdp \
--fsdp_offload_params false \
--fsdp_sharding_strategy SHARD_GRAD_OP \
--fsdp_auto_wrap_policy TRANSFORMER_BASED_WRAP \
--fsdp_transformer_layer_cls_to_wrap Phi3DecoderLayer \
--fsdp_state_dict_type FULL_STATE_DICT \
--fsdp_forward_prefetch false \
--fsdp_use_orig_params True \
--fsdp_cpu_ram_efficient_loading false \
--fsdp_sync_module_states True \
train.py \
--model_name_or_path Shitao/OmniGen-v1 \
--json_file ./toy_data/toy_data.jsonl \
--image_path ./toy_data/images \
--batch_size_per_device 1 \
--lr 2e-5 \
--keep_raw_resolution \
--max_image_size 1024 \
--gradient_accumulation_steps 1 \
--ckpt_every 100 \
--epochs 100 \
--log_every 1 \
--results_dir ./results/toy_finetune
Some important arguments:
num_processes
: number of GPU to use for trainingmodel_name_or_path
: path to the pretrained modeljson_file
: path to the json file containing the training data, e.g., ./toy_data/toy_data.jsonlimage_path
: path to the image folder, e.g., ./toy_data/imagesbatch_size_per_device
: batch size per devicelr
: learning ratekeep_raw_resolution
: whether to keep the original resolution of the image, if not, all images will be resized to (max_image_size, max_image_size)max_image_size
: max image sizegradient_accumulation_steps
: number of steps to accumulate gradientsckpt_every
: number of steps to save checkpointepochs
: number of epochslog_every
: number of steps to logresults_dir
: path to the results folder
The data format of json_file is as follows:
{
"instruction": str,
"input_images": [str, str, ...],
"output_images": str
}
You can see a toy example in ./toy_data/toy_data.jsonl
.
If an OOM(Out of Memory) issue occurs, you can try to decrease the batch_size_per_device
or max_image_size
. You can also try to use LoRA instead of full fine-tuning.
The checkpoint can be found at {results_dir}/checkpoints/*
. You can use the following command to load saved checkpoint:
from OmniGen import OmniGenPipeline
pipe = OmniGenPipeline.from_pretrained("checkpoint_path") # e.g., ./results/toy_finetune/checkpoints/0000200
LoRA fine-tuning is a simple way to fine-tune OmniGen with less GPU memory. To use lora, you should add --use_lora
and --lora_rank
to the command.
accelerate launch \
--num_processes=1 \
train.py \
--model_name_or_path Shitao/OmniGen-v1 \
--batch_size_per_device 2 \
--condition_dropout_prob 0.01 \
--lr 3e-4 \
--use_lora \
--lora_rank 8 \
--json_file ./toy_data/toy_data.jsonl \
--image_path ./toy_data/images \
--max_input_length_limit 18000 \
--keep_raw_resolution \
--max_image_size 1024 \
--gradient_accumulation_steps 1 \
--ckpt_every 100 \
--epochs 100 \
--log_every 1 \
--results_dir ./results/toy_finetune_lora
The checkpoint can be found at {results_dir}/checkpoints/*
. You can use the following command to load checkpoint:
from OmniGen import OmniGenPipeline
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")
pipe.merge_lora("checkpoint_path") # e.g., ./results/toy_finetune_lora/checkpoints/0000100
Here is an example for learning new concepts: "sks dog". We use five images of one dog from dog-example.
The json file is ./toy_data/toy_subject_data.jsonl
, and the images have been saved in ./toy_data/images
.
accelerate launch \
--num_processes=1 \
train.py \
--model_name_or_path Shitao/OmniGen-v1 \
--batch_size_per_device 2 \
--condition_dropout_prob 0.01 \
--lr 1e-3 \
--use_lora \
--lora_rank 8 \
--json_file ./toy_data/toy_subject_data.jsonl \
--image_path ./toy_data/images \
--max_input_length_limit 18000 \
--keep_raw_resolution \
--max_image_size 1024 \
--gradient_accumulation_steps 1 \
--ckpt_every 100 \
--epochs 200 \
--log_every 1 \
--results_dir ./results/toy_finetune_lora
After training, you can use the following command to generate images:
from OmniGen import OmniGenPipeline
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")
pipe.merge_lora("checkpoint_path") # e.g., ./results/toy_finetune_lora/checkpoints/0000200
images = pipe(
prompt="a photo of sks dog running in the snow",
height=1024,
width=1024,
guidance_scale=3
)
images[0].save("example_sks_dog_snow.png")