Data

The Cambridge-driving Labeled Video Database (CamVid) is a dataset that contains 701 Images captured from the perspective of a car driving on the roads of Cambridge, UK. The images are labelled with 32 semantic classes that include things like the road, footpath, cars, pedestrians, trafic signs, etc.

Input Data

The input images are RGB png images with dimensions of 960x720. Below is a sample of 20 images from the dataset.

Labels

The labels are also encoded as RGB PNG images, with each of the 32 semantic classes represented as a different RGB value.

The mapping of the different semantic classes is as follows:

	Animal (64, 128, 64)
	Archway (192, 0, 128)
	Bicyclist (0, 128, 192)
	Bridge (0, 128, 64)
	Building (128, 0, 0)
	Car (64, 0, 128)
	CartLuggagePram (64, 0, 192)
	Child (192, 128, 64)
	Column_Pole (192, 192, 128)
	Fence (64, 64, 128)
	LaneMkgsDriv (128, 0, 192)
	LaneMkgsNonDriv (192, 0, 64)
	Misc_Text (128, 128, 64)
	MotorcycleScooter (192, 0, 192)
	OtherMoving (128, 64, 64)
	ParkingBlock (64, 192, 128)
	Pedestrian (64, 64, 0)
	Road (128, 64, 128)
	RoadShoulder (128, 128, 192)
	Sidewalk (0, 0, 192)
	SignSymbol (192, 128, 128)
	Sky (128, 128, 128)
	SUVPickupTruck (64, 128, 192)
	TrafficCone (0, 0, 64)
	TrafficLight (0, 64, 64)
	Train (192, 64, 128)
	Tree (128, 128, 0)
	Truck_Bus (192, 128, 192)
	Tunnel (64, 0, 64)
	VegetationMisc (192, 192, 0)
	Void (0, 0, 0)
	Wall (64, 192, 0)

Distribution of classes

The distribution of how much of the image each class takes (as a proportion of the number of pixels in the image) can be viewed in the following plot. We can see that buildings, roads, sky, and trees disproportionately dominate the scenes, and other classes occur with much less frequency.

Data Preparation

Resizing Input images

The input images were resized and reshaped to 256*256 in dimensions, and stored in numpy arrays of shape [n_samples,256,256,3].

Label images

The label images were also resized to 256*256 in dimensions. But in order to be useful for training a deep learning model, they had to be converted from RGB images to single channel images, with the iteger value for each pixel representing the class ID. These were also stored in numpy arrays, but of shape [n_samples, 256,256].

Train/validation split

Out of the 701 samples in the data, 128 were put aside at random to for the validation set. This left 573 images remaining for the training set.

The resulting data had the following shapes:

Dataset	shape
`X_valid`	`[128, 256, 256, 3]`
`Y_valid`	`[128, 256, 256]`
`X_train`	`[573, 256, 256, 3]`
`Y_train`	`[573, 256, 256]`

Data Augmentation

Since the training set was quite small, data augmentation was needed for training in order to allow the model to generalize better.

The following data augmentation steps were taken:

shadow augmentation: with random blotches and shapes of different intensities being overlayed over the image.
random rotation: between -30 to 30 degrees
random crops: between 0.66 to 1.0 of the image
random brightness: between 0.5 and 4, with a standard deviation of 0.5
random contrast: between 0.3 and 5, with a standard deviation of 0.5
random blur: randomly apply small amount of blur
random noise: randomly shift the pixel intensities with a standard devition of 10.

Below is an example of two training images (and their accompanying labels) with data augmentation applied to them five times.

Class weighting

Since the distribution of classes on the images is quite imbalanced, class weighting was used to allow the model to learn about objects that it rarely sees, or which take up smaller regions of the total image. The method used was is from [Paszke et al 2016][paszke2016]

weight_class = 1/ln(c + class_probability)

Where:

c is some constant value that is manually set. As per [Romera et al 2017a][romera2017a], a value of 1.10 was used.

The table below shows the weight applied to the different classes when this formula is used. Greater weight is given to smaller or rarer objects, eg child (10.46), than objects that occur more often and consume large portions of the image, eg sky (4.37).

Class	Weight
Void	8.19641542163
Sky	4.3720296426
Pedestrian	9.88371158915
Child	10.4631076496
Animal	10.4870856376
Tree	5.43594061016
VegetationMisc	9.77766487673
CartLuggagePram	10.4613353846
Bicyclist	9.99897358787
MotorcycleScooter	10.4839211216
Car	7.95097080927
SUVPickupTruck	9.7926997517
Truck_Bus	10.0194993159
Train	10.4920586873
OtherMoving	10.1258833847
Road	3.15728309302
RoadShoulder	10.2342992955
Sidewalk	6.58893008318
LaneMkgsDriv	9.08285926082
LaneMkgsNonDriv	10.4757462996
Bridge	10.4371404226
Tunnel	10.4920560223
Archway	10.4356039685
ParkingBlock	10.1577327692
TrafficLight	10.1479245421
SignSymbol	10.3749043447
Column_Pole	9.60606490919
Fence	9.26646394904
TrafficCone	10.4882678542
Misc_Text	9.94147719336
Wall	9.32269173889
Building	3.47796865386

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data_exloration.md

data_exloration.md

Data

Input Data

Labels

Distribution of classes

Data Preparation

Resizing Input images

Label images

Train/validation split

Data Augmentation

Class weighting

Files

data_exloration.md

Latest commit

History

data_exloration.md

File metadata and controls

Data

Input Data

Labels

Distribution of classes

Data Preparation

Resizing Input images

Label images

Train/validation split

Data Augmentation

Class weighting