In the field of Computer Vision, image datasets play one of the important roles. Everyone, who begins his/her research in the aforementioned sphere, initially, need to choose model and set of data, appropriate for the given CV task. Usually, it is useful to make analysis (ex. analyze proportion of classes with given number of images or get classes with n images) in order to take a closer look at data. In this repository, you might find useful tools to perform image dataset analysis.
Let's choose the LFW (Labelled Faces in the Wild) face database as an example dataset.
-
First of all, it is necessary to create an instance of ImageDataset class and pass the path of dataset's directory as an argument.
Following structure of dataset directory is required to analyze it appropriately:
- image_dataset:
- class 1:
- image 1;
- image 2;
- ....
- class 2
- class 3
- .....
- class 1:
from image_dataset_analysis import ImageDataset lfw = ImageDataset("/LFW")
- image_dataset:
-
Now, we can get some useful information about dataset by calling the following method:
lfw.analyze()
Output:
Number of images in image dataset: 13214 Number of classes in image dataset: 5734 Mean number of images per class: 2 Minimum number of images per class: 1 Maximum number of images per class: 530 Formats of images in dataset: JPEG: 13213 (100.0%) Modes of images in dataset: RGB: 13213 (100.0%) Sizes of images in dataset: (128, 128): 13213 (100.0%) Number of classes with only 1 images : 4057 Number of classes with only 2 images : 777 Number of classes with only 3 images : 290 Remaining number of classes : 610
You might get useful information about other valuable tools along with their implementations in the example.pdf file. In addition, you may use example.ipynb to make analysis of your image datasets. By the way, don't forget to change the path of dataset's directory to your own one!
Feel free to ask or express your ideas in issues section.