Music to Image Interpolation

<--- Link to project in Google Colab

Generative AI pipeline that produces image interpolations from an audio track, leveraging Stable Diffusion technology.

Examples

Steve.Reich.-.Music.for.Pieces.of.Wood.30.seconds.extract.mp4

Steve Reich - Music for Pieces of Wood (30 second extract) (fps=7, num_inference_steps=20)

Karlheinz.Stockhausen.-.Helicopter.String.Quartet.mp4

Karlheinz Stockhausen - Helicopter String Quartet (25 seconds) (fps=5, num_inference_steps=30)

Jean-Claude.Risset.-.SUD.mp4

Jean-Claude Risset - SUD (30 second extract) (fps=7, num_inference_steps=20)

Antonio.Vivaldi.-Winter.mp4

Antonio Vivaldi - Winter (15 seconds extract) (fps=7, num_inference_steps=20)

Pipeline

Informations

The core of the system is the Stable Diffusion 'img2img' by Hugging Face. Image embeddings are created using the Image Bind model by Meta, which employs multimodality and transforms audio data into image embeddings.

The interpolation part is adapted from the publicly available code by nateraw (https://github.com/nateraw/stable-diffusion-videos.git), and the detextifier is also adapted from the publicly available code by iuliaturc (https://github.com/iuliaturc/detextify.git). The Stable Diffusion and ImageBind models are incorporated into the public code provided by Zeqiang-Lai (https://github.com/Zeqiang-Lai/Anything2Image.git).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Music to Image Interpolation

Examples

Pipeline

Informations

Files

README.md

Latest commit

History

README.md

File metadata and controls

Music to Image Interpolation

Examples

Pipeline

Informations