Page 76 - IJEEE-2023-Vol19-ISSUE-1

P. 76

72 | Abdulla & Marhoon

• Data Collection and Preprocessing Phase We noticed that the PlantVillage data are laboratory
Images of seven varieties of common tomato diseases data and lack diversity in the field environment. The model
appearing in Iraqi farms were taken from the PlantVillage must be trained on data with various features to increase its
tomato leaf dataset, a total of 11,192 images. Fig.3 shows a strength in the correct classification. A data set on tomato
set of these images and performs pre-processing, splitting the diseases were collected from Google image, totaling 114
dataset into training, validation, and testing, as shown in Fig. images. Fig. 6 shows a sample of these images. Since the
4. Due to the training dataset being unbalanced do dummy data obtained was few, an augmentation was also
augmentation process to remove the unbalance to avoid made using the same measures mentioned in the augmented
overfitting during the training process through flipping, in PlantVillage dataset. Table I shows the data collected from
zooming, and brightness scales, this process generated 3,846 google and their number after the augmentation using the
images. Fig.5 shows the training dataset before and after generated images in the training and accurate images for the
augmentation. test. And finally, resize images from (256 x 256) to (224 x
224) to fit the pre-trained network entries used.

Fig. 3: Sample of the image used in training models. Fig. 6: Sample of tomato disease from google image.
Fig. 4: The splitting of the dataset.
TABLE I:
Fig. 5: number of dataset after augmentation. NUMBER OF IMAGES COLLECTED FROM GOOGLE AND AUGMENTED.

Tomato disease Actual image Augmentation
image
Mosaic virus 13 130
Yellow leaf curl
15 120
vires
Bacterial spot 18 126
7 56
Healthy 21 168
Early blight 21 168
Late blight 19 112
Leaf mold 114 880

Total

71 72 73 74 75 76 77 78 79 80 81