Udacity Computer Vision Nanodegree Image Captioning Project
The repository contains a neural network, which can automatically generate captions from images.
The solution architecture consists of:
- CNN encoder, which encodes the images into the embedded feature vectors:
- Decoder, which is a sequential neural network consisting of LSTM units, which translates the feature vector into a sequence of tokens:
These are some of the outputs give by the network using the COCO dataset: