JUMBLED LETTERS CAPSTONE PROJECT

Motivation

"Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe."

Since 2003, this sentence has been circulating on the internet. However, it seems that there never was a Cambridge research about it, but the general public has been debating for some time on the reason behind why we can read that particular jumbled text.

Starting from this premise, I created and deployed model using Pytorch, SageMaker and a Jupyter Notebook that tries to reconstruct the original sentence from the jumbled letter of the words, maintaining the same first and last letter of each word for the final project of the Machine Learning Engineer Nanodegree at Udacity.

This project was done with AWS Sagemaker.

Data and Model Used

For the data I used a subset of 1,499 blogs out of 19,320 on the Blogger Corpus. This model consisted of an LSTM with an encoder-decoder architecture, with a Fully Connected Layer connected to the output.

Results

It achieved a performance of 30% of fully correct reconstructed words and 63% of right guessed letters from 20,000 words. An additional simple HTML file was created to implement it on the web.

Libraries:

Library	Sublibrary
io	StringIO
os
glob
boto3
random
pickle
bs4	BeautifulSoup
chardet
sklearn.utils	shuffle
itertools	groupby
unicodedata
string
argparse
six	BytesIO
re
matplotlib.pyplot (as plt)
numpy (as np)
pandas (as pd)
sagemaker	get_execution_role
sagemaker.pytorch	PyTorch, PyTorchModel
sagemaker.predictor	RealTimePredictor
sagemaker_containers
nltk.corpus	stopwords
nltk.stem.porter	*
torch	torch.utils.data
	torch.optim (as optim)
	torch.nn (as nn)
	torch.functional (as F)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Development Code		Development Code
Capstone Project.html		Capstone Project.html
Capstone Project.ipynb		Capstone Project.ipynb
Project Report.pdf		Project Report.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JUMBLED LETTERS CAPSTONE PROJECT

Motivation

Data and Model Used

Results

Libraries:

About

Releases

Packages

Languages

lluisg/CapstoneProject

Folders and files

Latest commit

History

Repository files navigation

JUMBLED LETTERS CAPSTONE PROJECT

Motivation

Data and Model Used

Results

Libraries:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages