Udacity CVND Image Captioning Project

Udacity Computer Vision Nanodegree Image Captioning Project

Introduction

The repository contains a neural network, which can automatically generate captions from images.

The solution architecture consists of:

CNN encoder, which encodes the images into the embedded feature vectors:
Decoder, which is a sequential neural network consisting of LSTM units, which translates the feature vector into a sequence of tokens:

These are some of the outputs give by the network using the COCO dataset:

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
0_Dataset.ipynb		0_Dataset.ipynb
1_Preliminaries.ipynb		1_Preliminaries.ipynb
2_Training.ipynb		2_Training.ipynb
3_Inference.ipynb		3_Inference.ipynb
README.md		README.md
data_loader.py		data_loader.py
data_loader_val.py		data_loader_val.py
filelist.txt		filelist.txt
model.py		model.py
training_log.txt		training_log.txt
vocab.pkl		vocab.pkl
vocabulary.py		vocabulary.py
workspace_utils.py		workspace_utils.py