Most of my public LoRA's are here: https://civitai.com/user/chairfull
Model | Images Used | Note |
---|---|---|
Josh Brolin | 19 | Fewest images used. |
Guy Pierce | 32 | Used CLIP instead of BLIP. |
Woody Harrelson | 38 | First model I used regularisation images on. |
Jack Nicholson | 52 | Most downloaded [Male]. |
Kelly Brook | 146 | Captioned with CLIP Interrogator 2.1 at best setting. For most models I use BLIP. |
Anne Hathaway | 147 | Most downloaded. |
Maitland Ward | 325 | Most images used. |
Image quality = Model quality.
Image quantity = Model flexibility.
Image quality is a big part in how well a LoRA turns out, so try to find the highest quality images you can.
Many images I've used are over 2000x3000. Some >8000x5000. I only crop out other people and text. I don't resize.
High quality image != big image. A high quality is one where if you zoom in you see details like skin pores, eye flecks, fabric threads.
If you zoom in and it looks blurry, that image is someones crummy upscale. Using too many of those images in training will give the model a cartoon airbrush look.
Imagus: See full image by hovering it or a link, and hit Ctrl+S
to save it.
Double Click Image Downloader: For quicker downloading.
YouTube Screenshot Button: Make sure to set the video quality to HD, as high as you can set it.
Pause a video and use ,
and .
keys to move back/forward one frame, to find the least blurry frame.
UBlockOrigin: Nicest adblocker, imo.
Yandex This is my goto. Better than Google's image search. Allows easilly finding in different size.
-
Search your subject.
-
Sort images by largest
- On the right is a size drop down, attempt to find the biggest.
You can also search for better quality images by dragging them into Yandex, to do a Similar image search
.
I'm now using YouTube, and it works quite well. Get a YouTube screenshot chrome extension and use ,
.
to find unique facial angles that aren't blurry. (Be sure to set video quality to highest possible.)
For a person, attempt to find at least one of each:
- Profile left + profile right.
- 3/4 left + 3/4 right.
- Looking at camera.
- Looking up + looking down.
- (Bonus) Looking up + down at 3/4 left and right.
- (Bonus) All these angles with multiple expressions (happy, neutral, angry)
While looking for images I save as many as look decent. Sometimes coming across higher quality versions later. So I end up with duplicates.
To remove duplicates I use Geeqie Image viewer.
Open Geeqie and go to your folder of images.
Select all of them in the lower right panel.
Right Click
and select Find duplicates
.
Sort on Similarity
, (low, med, high).
If it finds any, get rid of whichever ones seem lower quality, by Right Clicking
and selecting Delete
.
Select the few images that need cropping and drag into bulkimagecrop.com.
While I try to remove other similar subjects (other males, if training on a male), and text, I don't try to center the subject. Having the subject dead center in every image could train the model to think you always want that.
I have the subject be on far left, far top, bottom right...
Once you've cropped and downloaded the images to zips, you can mass unzip with: unzip \*.zip
Zip the images: zip ./my_pics -r .
- Upload the zip to your Google drive.
Right Click
it in GDrive, selectShare
orGet link
.- Toggle
Make Public
. - Click
Copy link
.
I used: https://github.com/Linaqruf/kohya-trainer (Dreambooth method, top one.)
I use the Google colab version as my GPU sucks, but I assume it works the same if you run it on your pc.
I mostly BLIP to auto caption the images.
Recently I started upping the word count from 15-75
to 30-100
, and the results have seemed a tinge better?
Leave pretty much all the settings values at their default, except:
- For pre trained model download:
Stable-Diffusion-v1-5
. - For VAE model download:
stablediffusion.vae.pt
- Set
pretrained_model_name_or_path
to/content/pretrained_model/Stable-Diffusion-v1-5.safetensors
- Set
vae
to/content/vae/stablediffusion.vae.pt
- Set
class_token
toman
.
Setting it to 5 seems to really lower the loss faster. I've tried it at .00001, .01, .6, 2, 5, and 15. And 5 seemed the best.
I had this at 32 for the longest time, but raising it to 64 really improved the likeness.
I don't understand the relation with network_alpha
. I've set it to 1, 4, and 32 and couldn't see what was different?
Random ideas I'm trying out:
I've tried all different methods with regularisation images and don't find them that great. Maybe I'm doing something wrong.
For this Woody Harrelson model, I used all the photos of males that I've trained other models with, as regularization data. Didn't crop anything or care about aspect ratio. It seems Kohya will bucket them. Everything worked fine.
Tokens, in the captions, are what you don't want trained as part of your model, with the exception of the class_token
:
So for a man: a man, in a red hat, in a forest
would only extract the man
not the red hat
or forest
.
Theoretically, this should work for style and image quality, so for old images I might add: blurry, old image, scan, jpeg artifacts, low quality
in hopes the model will pull a sharper image.
For this model I captioned 146 images with CLIP Interrogator 2.1 on the best
setting.
It took a long time, and I don't know that it was worth it. Theoretically it should be more flexible than other models. Needs more testing.
To get more expressiveness out of training data, I'm going to try a sentiment analyzer on a set of photos.
Maybe instead of a single subject, I will train a ton of random faces of emotions at different angles, and then caption each like
img1.png: a man on the beach, neutral_90 sad_20 fear_5 happy_3
img2.png: a woman at work, happy_40 neutral_20 sad_9