Skip to content

Latest commit

 

History

History
46 lines (25 loc) · 1.99 KB

README.md

File metadata and controls

46 lines (25 loc) · 1.99 KB

Music to Image Interpolation

Open In Colab <--- Link to project in Google Colab


Generative AI pipeline that produces image interpolations from an audio track, leveraging Stable Diffusion technology.


Examples

Steve.Reich.-.Music.for.Pieces.of.Wood.30.seconds.extract.mp4

Steve Reich - Music for Pieces of Wood (30 second extract) (fps=7, num_inference_steps=20)


Karlheinz.Stockhausen.-.Helicopter.String.Quartet.mp4

Karlheinz Stockhausen - Helicopter String Quartet (25 seconds) (fps=5, num_inference_steps=30)


Jean-Claude.Risset.-.SUD.mp4

Jean-Claude Risset - SUD (30 second extract) (fps=7, num_inference_steps=20)


Antonio.Vivaldi.-Winter.mp4

Antonio Vivaldi - Winter (15 seconds extract) (fps=7, num_inference_steps=20)


Pipeline

Pipeline


Informations

The core of the system is the Stable Diffusion 'img2img' by Hugging Face. Image embeddings are created using the Image Bind model by Meta, which employs multimodality and transforms audio data into image embeddings.

The interpolation part is adapted from the publicly available code by nateraw (https://github.com/nateraw/stable-diffusion-videos.git), and the detextifier is also adapted from the publicly available code by iuliaturc (https://github.com/iuliaturc/detextify.git). The Stable Diffusion and ImageBind models are incorporated into the public code provided by Zeqiang-Lai (https://github.com/Zeqiang-Lai/Anything2Image.git).