Page 274 - 2024-Vol20-Issue2
P. 274
270 | Qadir, Abdalla & Abd
Fig. 2. Samples of the dataset: (A) Benign, (B) Malignant, all of the pixels were normalized to the range of 0 and 1 in
and (C) Normal. order to reduce the computational complexity of the image
recognition process. In our study, we addressed the challenge
images was the DICOM format. CT procedure consists of of class imbalance in our dataset through the application of
120 kV, Window sizes of 350 to 1200 HU, slices of 1 mm Stratified 5-fold Cross-Validation. This technique is particu-
of thickness, and window centers ranging from 50 to 600 for larly beneficial for imbalanced datasets as it ensures that each
reading with full inspiration, with breath-hold. Before the fold maintains a consistent ratio of the different class labels,
analysis process, all photos were undergone de-identification. closely mirroring the original dataset’s class distribution. By
Each scan includes a number of slices. The number of these employing this method, we aimed to achieve a more reliable
slices ranges between 80 and 200, and each of them is rep- and unbiased evaluation of our model’s performance. Each
resented with a picture of the patient’s chest shown from of the five folds was used once as a validation set, while the
different perspectives and various sides. Each of the benign remaining four folds served as the training set. This approach
patients had a different age, gender, geographical location, not only allowed for a thorough assessment of the model
educational level, and living condition. Some of these patients across different subsets of the data but also contributed to miti-
were workers and farmers, while another portion of them were gating the potential bias that could arise from the uneven class
employees of the Transport and Oil Ministry of Iraq. Mostly distribution. The consistent representation of classes in each
living in the provinces of Babylon, Wasit, Salahuddin, Diyala, fold helped in validating the robustness and generalizability of
and Baghdad. Fig. 2 shows a sampling of our datasets. The our model, ensuring that the validation metrics were reflective
dataset was divided into three categories: malignant (40), be- of its performance across the diverse class spectrum [24].
nign (15), and normal (55).
Benign lesions, such as benign tumors or cysts, are non- D. Feature Extraction using VGG-16
cancerous but may resemble cancer on scans. They typically
have well-defined borders and uniform appearances, unlike Initially, the first VGG network design was proposed in a
malignant tumors, which are often irregular and rapidly chang- study by Simonyan and Zisserman [25]. Two variations of the
ing. A common benign lung lesion is a hamartoma, character- model were created, which are a 16-layer (VGG-16) as well
ized by its slow growth and well-circumscribed nature [22], as a 19-layer (VGG-19) network, with the aim of submitting
while normal lung imaging shows healthy lung tissue with no these models and participating in the ImageNet challenge in
abnormalities. These images are essential for comparison to 2014, where the competition was won by the team of Visual
identify pathological changes in subsequent scans [23]. Geometry Group (VGG), achieving a second place in the lo-
calization track while taking first place in the classification
C. Preprocessing track. The architecture of VGG16 is composed of five con-
In this study, two major preprocessing steps are utilized. First, volutional layer blocks followed by three layers fully linked.
the dataset images came with an original size of 512x512 pix- Convolutional layers make sure that each of the activation
els; the size of these images was reduced to 224x224, which maps stays at the same spatial dimensions compared to the
is the default input size of the VGG16 network. Second, layer, which is below it, through the use of 3x3 kernels and a
padding and stride of 1.
After each convolution, a rectified linear unit (ReLU) activa-
tion and max pooling operation are immediately performed
to reduce the spatial dimension. In the activation map, each
spatial dimension from the previous layer is cut in half when
using 22 kernels through the max pooling layers while using
two strides and without padding. The final layer is the softmax
layer, which consists of 1,000 completely linked layers and
is then employed after two fully connected layers containing
4,096 units with a ReLu activation function.
At the same time, one of the drawbacks of the VGG6 model is
that it is expensive to analyze, and it requires a lot of process-
ing power in terms of memories and parameters. The model
contains almost 138 million parameters. In the proposed
model, an XGBoost classifier replaces the fully connected
layers, in which approximately 123 million parameters reside.
These layers are considered the bulk of these parameters; this