Skip to content

Latest commit

 

History

History

Dataset

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

SoML-50

Handwritten mathematical expression dataset for value evaluation and notation type classification.

The dataset consists of 50,000 images of dimensions 384 x 128 (w x h), with each image having a expression in either the prefix (eg: +12), postfix (eg: 12+) or infix (eg: 1+2) notations. The characters in the image are evenly spaced, such that every one-third of the image has a single character in it.

Some sample images from the dataset can be seen below:




The annotations of the dataset contain the label (prefic, postfix or infix) and the value obtained after evaluation of the expression

Image Label Value
100.jpg prefix 0
311.jpg postfix 3
9991.jpg prefix 0
34788.jpg infix 7
34651.jpg prefix 16
34611.jpg infix 28

Download

The dataset can be downloaded from google drive. On extracting, it has a directory 'data' with all images and a 'annotations.csv' file with the labels and values of each image.