Page 52 - 2023-Vol19-Issue2

P. 52

48 | Hashim & Yassin

combine first to obtain the highly interrelated features with TABLE II.
each other and know which of these features has the highest FEATURES DROPPED FROM THE DATASET
degree of correlation with the target class. We create a new
filter method that combines PC and MI. This method is called No Features No Features
feature selection based on PC and MI (PC-MI).
1 area worst 7 compactness mean
Firstly, this method finds the features that have a PC value
greater than or equal to 0.89 by analysing the correlation heat 2 radius worst 8 concavity mean
map of features shown in Fig. 4. Secondly, it works to merge
the set of interrelated features that contain common features 3 perimeter mean 9 concave points worst
into one group. Thirdly, one common feature is chosen from
each group, which obtains the highest MI value. Finally, the 4 radius mean 10 radius se
remaining features in each group are dropped from the dataset
(Table II). Fig. 5 shows the proposed method of our work. 5 area mean 11 perimeter se

6 texture mean 12 compactness worst

We note from previous table that 12 features are dropped
from the dataset, where five features are dropped from the first
group, one feature is dropped from the second group, three
features are dropped from the third group, two features are
dropped from the fourth group and one feature is dropped
from the fifth group. As a result, we have the 18 best features.

Table III shows that five groups are found. Each group
contains pairs of features that are strongly connected to each
other, and all these pairs that belong to one group have com-
mon features. One feature from each group with the highest
MI value is selected as the highest feature with a strong corre-
lation with the target class.

4) Normalizing the Selected Features: After selecting the
best features from the dataset through the proposed method
for feature selection, we normalise the remaining features
using StandardScaler. The main objective of StandardScaler
is to convert feature values into standard units free from the
influence of the arithmetic mean and dispersion, where the
resulting values are free from the units of measurement. It
can be computed from eq. 3 [22]:

Z = [X - X¯ ]/S (3)

where: • Z: StandardScaler Score

• X: Sample
• X¯ : Arithmetic mean

• S: Standard deviation

Fig. 5. Proposed feature selection method. C. Prediction Phase
After the pre-processing of the dataset and the selection of
The result of the proposed method for selecting the fea-
ture is to drop 12 features from the dataset; therefore, the the appropriate features, the dataset is ready to work with the
remaining features that will be used are only 18. We remove ML model for making predictions. Therefore, in this section,
unimportant features that hinder the work of the ML model we explain the mechanism for dividing the dataset, the ML
and keep the features that help the model learn correctly and models used and the proposed model that will be used in the
give the best classification accuracy. prediction process.

1) Splitting Dataset: The dataset will be split into two
parts. The first part is the training, which is a set of data used
in training and building the model. The second part is the
testing, which is a set of data in which the performance of
the model is tested using a specific scale. In this paper, two
methods of splitting are used as follows:

47 48 49 50 51 52 53 54 55 56 57