Table of Contents
Generative AI is a fast-growing niche in the world of machine learning with applications in various industries including AI generated art. To expand on this niche, we have fine-tuned an image diffusion model to generate pixel-style GIFs that morph between two images from provided text prompts. This involved fine-tuning an existing diffusion model on pixel-art dataset(s), generating two images from the first and second prompt, and interpolating their prompts and latent noise tensors together to morph the generated images. To create pixel-art GIFs we extracted the interpolation outputs and stitched the outputs together. This research aims to narrow down the use cases of diffusion models and contribute to the field of generative modeling.
The datasets that we experimented with for the fine-tuning process are listed below. They are all on HuggingFace.
-
jainr3/diffusiondb-pixelart: This is a subset of the DiffusionDB dataset containing image samples that have been passed through the pixelatorapp.com tool to make "pixel-art" style images.
-
sunilSabnis/pixelart: This is a dataset of pixel-style art generated from the stable-diffusion2-1 model itself. The prompts were selected from andyyang/stable_diffusion_prompts_2m.
-
jiovine/pixel-art-nouns-2k: This is a class-specific dataset of pixel-style art; more specifically the images are of cartoon characters.
The models that were obtained as a result of fine-tuning with these datasets are listed below. These are all on HuggingFace.
-
jainr3/sd-diffusiondb-pixelart-model-lora: These are LoRA adaption weights for stabilityai/stable-diffusion-2-1. The weights were fine-tuned on the jainr3/diffusiondb-pixelart dataset.
-
jainr3/sd-pixelart-model-lora: These are LoRA adaption weights for stabilityai/stable-diffusion-2-1. The weights were fine-tuned on the sunilSabnis/pixelart dataset.
-
jainr3/sd-nouns-model-lora: These are LoRA adaption weights for stabilityai/stable-diffusion-2-1. The weights were fine-tuned on the jiovine/pixel-art-nouns-2k dataset.
-
jainr3/sd-diffusiondb-pixelart-v2-model-lora : These are LoRA adaption weights for stabilityai/stable-diffusion-2-1. The weights were fine-tuned on the jainr3/diffusiondb-pixelart dataset. This model has been trained for 30 epochs while the jainr3/sd-diffusiondb-pixelart-model-lora model was trained on only 5 epochs.
-
First text prompt: "Snowy cabin in the woods"
Second text prompt: "A medieval castle"
See here for more details.
-
First text prompt: "The sun shining brightly"
Second text prompt: "A full moon glowing brightly"
See here for more details.
-
First text prompt: "A snowy mountain"
Second text prompt: "Pyramids in Egypt"
See here for more details.
-
First text prompt: "A surfer"
Second text prompt: "A snowboarder"
-
GIF-chaining example
'Snowy cabin in the woods', 'A house boat on a lake', 'A beach house on a sunny day', 'A medieval castle', 'A gothic style clock tower', 'A skyscraper in a large metropolitan city', 'The empire state building in new york city during night'
See here for more details.
To get a local copy up and running follow these simple example steps.
A powerful GPU is necessary for most parts, so one may opt to use Google Colaboratory where an A100 high-RAM GPU is easily available with the Colab Pro plan.
- Clone the repo
git clone https://github.com/sunil-2000/text-to-pixel-gif.git
- Install the requirements (install diffusers and transformers libraries at a minimum for inference)
pip install -r requirements.txt
Model Fine-Tuning
-
Obtain a Huggingface API Key from https://huggingface.co/ and save for later.
-
Obtain a Wandb API Key from https://wandb.ai/ and save for later.
-
Utilize the fine-tuning scripts located in the
colab-notebooks
folder. There are a number of example scripts for the different experiments that we performed which are for example using different datasets or training for shorter/longer. The API keys will be needed in these scripts when prompted.
GIF Generation / Chaining
- See this example for detailed instructions.
Distributed under the MIT License. See LICENSE.txt
for more information.
Rahul Jain, Sunil Sabnis, Joseph Iovine, Kenneth Alvarez, and Carlos Ponce
Project Link: https://github.com/sunil-2000/text-to-pixel-gif
This project was created as a part of the CS 5787 Deep Learning Final Project for the Spring 2023 semester at Cornell Tech under the guidance of Professor Alex Jaimes.