C - Ditech is an Android application that aims to detect diseases in chickens through their feces. The application will be able to detect diseases such as Coccidiosis, Healthy, and Salmonella. Information about the disease and how to handle it will be provided in the app. Our team wants to tackle this problem to help farmers detect diseases early and take appropriate measures to prevent further spread and loss.
We use Poultry Diseases Detection dataset in Kaggle by Kausthub Kannan. The dataset consists of 4 directories. Each directory contains images in jpeg format. There are 4 classes
- Coccidiosis (2103 images)
- Salmonella (2057 images)
- Newcastle diseases (376 images)
- Healthy poultry (2276 images)
The images are feces of the poultries with random image sizes and orientations. The total images in this dataset are 6812 images of feces and have a size of 8.64GB
In data preparation, images will retrieve from each directory with tensorflow.keras.utils.image_dataset_from_directory
function because this function generate a tf.data.Dataset
which is can use method cache
and prefetch
function to optimize I/O operations and speed up training process. Considering the dataset above, we have an unbalanced class of Newcastle disease. If we are still practicing with unbalanced data, it'll affect real accuracy such as f1, precision, recall, and support. So we'll downsample all classes except the NCD class. The total images of each class must be the same as the others. For each class, we'll make it as large as the NCD class which is 376 images.
Then we'll apply some augmentations to minimize overfitting the model with our training data like; flip horizontal, flip vertical, random contrast, and random brightness. We make augmentation layers part of our model so it'll run on-device, synchronously with the rest of your layers, and benefit from GPU acceleration. And then, we resize our input images to 224x224
, this number have to defined earlier because we'll use transfer in the next step that required a specific image size. Lastly, we split the dataset into 2 which are the training dataset and the validation dataset with a ratio of 80% : 20%
The model architecture of this model is simply. This is the detail of model:
- Augmentation layer
- Rescaling layer
- Pretrained model (
DenseNet121
) - Average Pooling
- Dense layer (
128 unit size
,relu
) - Dropout (
0.3 rate
) - Dense layer (
4 unit size
,softmax
)