Skip to content

Generating Pixel-Art-Style Gifs from Text Prompts

License

Notifications You must be signed in to change notification settings

jainr3/text-to-pixel-gif

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contributors Forks Stargazers MIT License


Logo

Pixel Giffusion (text-to-pixel-GIF)

Generating Pixel-Art-Style Gifs from Text Prompts
Examples »

Table of Contents
  1. About The Project
  2. Getting Started
  3. License
  4. Contact
  5. Acknowledgments

About The Project

Generative AI is a fast-growing niche in the world of machine learning with applications in various industries including AI generated art. To expand on this niche, we have fine-tuned an image diffusion model to generate pixel-style GIFs that morph between two images from provided text prompts. This involved fine-tuning an existing diffusion model on pixel-art dataset(s), generating two images from the first and second prompt, and interpolating their prompts and latent noise tensors together to morph the generated images. To create pixel-art GIFs we extracted the interpolation outputs and stitched the outputs together. This research aims to narrow down the use cases of diffusion models and contribute to the field of generative modeling.

[Back to Top]

Built With

[Back to Top]

Datasets & Models

The datasets that we experimented with for the fine-tuning process are listed below. They are all on HuggingFace.

  1. jainr3/diffusiondb-pixelart: This is a subset of the DiffusionDB dataset containing image samples that have been passed through the pixelatorapp.com tool to make "pixel-art" style images.

  2. sunilSabnis/pixelart: This is a dataset of pixel-style art generated from the stable-diffusion2-1 model itself. The prompts were selected from andyyang/stable_diffusion_prompts_2m.

  3. jiovine/pixel-art-nouns-2k: This is a class-specific dataset of pixel-style art; more specifically the images are of cartoon characters.

The models that were obtained as a result of fine-tuning with these datasets are listed below. These are all on HuggingFace.

  1. jainr3/sd-diffusiondb-pixelart-model-lora: These are LoRA adaption weights for stabilityai/stable-diffusion-2-1. The weights were fine-tuned on the jainr3/diffusiondb-pixelart dataset.

  2. jainr3/sd-pixelart-model-lora: These are LoRA adaption weights for stabilityai/stable-diffusion-2-1. The weights were fine-tuned on the sunilSabnis/pixelart dataset.

  3. jainr3/sd-nouns-model-lora: These are LoRA adaption weights for stabilityai/stable-diffusion-2-1. The weights were fine-tuned on the jiovine/pixel-art-nouns-2k dataset.

  4. jainr3/sd-diffusiondb-pixelart-v2-model-lora : These are LoRA adaption weights for stabilityai/stable-diffusion-2-1. The weights were fine-tuned on the jainr3/diffusiondb-pixelart dataset. This model has been trained for 30 epochs while the jainr3/sd-diffusiondb-pixelart-model-lora model was trained on only 5 epochs.

[Back to Top]

Examples

  1. First text prompt: "Snowy cabin in the woods"

    Second text prompt: "A medieval castle"

    See here for more details.

  1. First text prompt: "The sun shining brightly"

    Second text prompt: "A full moon glowing brightly"

    See here for more details.

  1. First text prompt: "A snowy mountain"

    Second text prompt: "Pyramids in Egypt"

    See here for more details.

  1. First text prompt: "A surfer"

    Second text prompt: "A snowboarder"

  1. GIF-chaining example

    'Snowy cabin in the woods', 'A house boat on a lake', 'A beach house on a sunny day', 'A medieval castle', 'A gothic style clock tower', 'A skyscraper in a large metropolitan city', 'The empire state building in new york city during night'

    See here for more details.

[Back to Top]

Getting Started

To get a local copy up and running follow these simple example steps.

Prerequisites

A powerful GPU is necessary for most parts, so one may opt to use Google Colaboratory where an A100 high-RAM GPU is easily available with the Colab Pro plan.

Installation

  1. Clone the repo
    git clone https://github.com/sunil-2000/text-to-pixel-gif.git
  2. Install the requirements (install diffusers and transformers libraries at a minimum for inference)
pip install -r requirements.txt

Model Fine-Tuning

  1. Obtain a Huggingface API Key from https://huggingface.co/ and save for later.

  2. Obtain a Wandb API Key from https://wandb.ai/ and save for later.

  3. Utilize the fine-tuning scripts located in the colab-notebooks folder. There are a number of example scripts for the different experiments that we performed which are for example using different datasets or training for shorter/longer. The API keys will be needed in these scripts when prompted.

GIF Generation / Chaining

  1. See this example for detailed instructions.

[Back to Top]

License

Distributed under the MIT License. See LICENSE.txt for more information.

[Back to Top]

Contact

Rahul Jain, Sunil Sabnis, Joseph Iovine, Kenneth Alvarez, and Carlos Ponce

Project Link: https://github.com/sunil-2000/text-to-pixel-gif

[Back to Top]

Acknowledgments

This project was created as a part of the CS 5787 Deep Learning Final Project for the Spring 2023 semester at Cornell Tech under the guidance of Professor Alex Jaimes.

[Back to Top]

About

Generating Pixel-Art-Style Gifs from Text Prompts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%