In this project we will create annotations/labels for cuban salsa videos using deep learning models. Salsa is a couple dance where movement patterns appear in counting of 4 steps.
We tried different sizes of elementary units and notation systems, and ended up with labelling the data with the kind of units that dancers are using too. So, our unit consists of elements that are as long the dancers counting 123-567- or two bars of music.
The labeling problem and the unbalanced size of categories are solved by using a coreography where each dancer repeats each element twice, then turns 90-degrees so that the same choreography can be filmed from a different angle. We film in total of 4 different angles, so each dancer repeats each figure a total of 8 times.
We made a video with the choreography for the people who made us salsa videos. https://drive.google.com/file/d/1tX5dczXymc4EjAB0A9-5mkPx-pvV412n/view?usp=sharing
There is a presentation on Youtube at https://youtu.be/n96vY72kABs?t=304 of our project.
The input data is videos from mobile phones. In preprossing the videos are converted to same size and speed.
Information of the dancer in the videos is simplified by capturing the key points with Openpose. We use OpenPose for posedetection[1]. OpenPose gives us the x- and y-coordinates for 25 different joints, their visibility. This temporal sequence of coordinates is used for the input of the model.
We drop the parts of the frame where the dancer is never present and normalize the results, so that we have all dancers in the same scale despite the distance to camera.
The beginning of each unit/figure and the end of each choreography is manually marked with a mouse click. We are confident that this could be fully automated, but given the purpose of the project is done first, if we go further with the project and have thousands of vidoes to process.
Here we face a supervised learning problem: giving a sequence of pose positions covering 123-578 (features) predict the name of the sequence (label). 5 labels are possible in the first choreography: "basic", "cuban-basic", "right-turn", "side" and "suzie-q".
The model is a deep network that consists of 2 Gate Recurrent Unit (GRU) layers of 64 and 32 units, repectively. 2 fully connected layers of 16 and 5 neurons are added at the end.
To train the model the collected videos were splitted in training and validation data. From each choreography video we get 8 instances of a sequence. After data augmentation the train and test sets have 384 and 25 instances per figure.
The predictions of the model can be tested in 3 ways: 1-From your local terminal, 2-Running Streamlit locally or 3-Running Streamlit in the cloud. Because OpenPose needs a long computational time without GPU (around 11 hours for all our videos), we provide the some videos that were already preprocessed with OpenPose
For the first 2 methods you will need to create an appropriate working enviroment. In terminal:
git clone https://github.com/eeroolli/SalsaAnnotation.git
cd SalsaAnnotation
conda create --name testing_salsa python=3.8
conda activate testing_salsa
pip install -r requirements.txt
You cand find a more detailed guide in ./env/README.md
From the terminal run the following command
./run_prediction_in_python.sh predict Ana 4 True
When you run this script a window with the video to be annotated will pop up. You must click on that window every time the dancer is in 1 and at the very end of the video. The coreographies in the Sample folder cover 5 figures that are repeated twice each. The expeced number of clicks in this case is 11. If you click a different number of times you will get an error.
Streamlit is a way to build frontends to Machine Learning models that can be run either locally or online. From the terminal run the following command
streamlit run make_prediction_demo.py
Unfortunately this does not work in November 2024 anymore because Streamlit does not support the Python version we used anymore. just go to https://share.streamlit.io/eeroolli/salsaannotation/development/make_prediction_demo.py
[1] https://github.com/CMU-Perceptual-Computing-Lab/openpose