Visualizing the temporal structure of video using deep convolutional autoencoders
This repository develops CNN models for transforming a video into trajectories in an embedding space frame-by-frame.
Please see my blog post for a description of this project. The post focuses on audio embedding but the models handeling video data are almost identicle. The video embedding pipeline replaces 1D convolutional networks (in the case of audio) with 2D convolutions. As of now I haven't written a formal description of the code in this repository.