Skip to content

Latest commit

 

History

History
56 lines (43 loc) · 2.31 KB

readme_txt2im.md

File metadata and controls

56 lines (43 loc) · 2.31 KB

Text to images fine-tuning

Completed with the diffusers tutorial.
Available here stable, or latest (more details)

Set-up

Following the guide, install locally the diffusers library from github (we do it within the srt2i env).

git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .
cd examples/text_to_image
pip install -r requirements.txt

then

accelerate config

currently selecting only the basic options and no to any accelarator proposed.

This config file will save in /home/<user>/.cache/huggingface/accelerate/default_config.yaml

1. Training with hf dataset

Still following the diffusers tutorial, we retrain on an existing dataset.
The script to retrain is ./diffusers/examples/text_to_image/0_myscript.sh.
And uses the existing python script with various options.

After training the model is run with the usual diffusers pipeline : inference notebook

from diffusers import StableDiffusionPipeline
import torch

model_path = "sd-pokemon-model"
pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16)
pipe.to("cuda")

image = pipe(prompt="yoda").images[0]
image.save("yoda-pokemon.png")

2. Training with custom dataset

Requires a dataset in the correct format.
Dataset requirement:

  • folder with images
  • metada.jsonl (in the same folder) with
    {"file_name": "0001.png", "text": "This is an image caption/description"}
    {"file_name": "0002.png", "text": "This is an other image caption/description"}
    ...

See inference nb for a translation from a Kaggle dataset.
Implementation from generated dataset after the llava captioning step : in retrain notebook tutorial

The script to retrain is slightly different this time : ./1_custom_ds.sh due to experimental changes in training parameters.