Page 76 - IJEEE-2023-Vol19-ISSUE-1
P. 76

72 |                                                                                                             Abdulla & Marhoon

  • Data Collection and Preprocessing Phase                              We noticed that the PlantVillage data are laboratory
   Images of seven varieties of common tomato diseases            data and lack diversity in the field environment. The model
appearing in Iraqi farms were taken from the PlantVillage         must be trained on data with various features to increase its
tomato leaf dataset, a total of 11,192 images. Fig.3 shows a      strength in the correct classification. A data set on tomato
set of these images and performs pre-processing, splitting the    diseases were collected from Google image, totaling 114
dataset into training, validation, and testing, as shown in Fig.  images. Fig. 6 shows a sample of these images. Since the
4. Due to the training dataset being unbalanced do                dummy data obtained was few, an augmentation was also
augmentation process to remove the unbalance to avoid             made using the same measures mentioned in the augmented
overfitting during the training process through flipping,         in PlantVillage dataset. Table I shows the data collected from
zooming, and brightness scales, this process generated 3,846      google and their number after the augmentation using the
images. Fig.5 shows the training dataset before and after         generated images in the training and accurate images for the
augmentation.                                                     test. And finally, resize images from (256 x 256) to (224 x
                                                                  224) to fit the pre-trained network entries used.

Fig. 3: Sample of the image used in training models.              Fig. 6: Sample of tomato disease from google image.
          Fig. 4: The splitting of the dataset.
                                                                                               TABLE I:
   Fig. 5: number of dataset after augmentation.                  NUMBER OF IMAGES COLLECTED FROM GOOGLE AND AUGMENTED.

                                                                  Tomato disease    Actual image  Augmentation
                                                                                                       image
                                                                   Mosaic virus           13            130
                                                                  Yellow leaf curl
                                                                                          15            120
                                                                         vires
                                                                   Bacterial spot         18            126
                                                                                           7             56
                                                                       Healthy            21            168
                                                                    Early blight          21            168
                                                                     Late blight          19            112
                                                                     Leaf mold            114           880

                                                                        Total
   71   72   73   74   75   76   77   78   79   80   81