Page 53 - 2023-Vol19-Issue2
P. 53

49 |                                                                           Hashim & Yassin

                           TABLE III.
        PC-MI METHOD FOR FEATURE SELECTION

groups                Correlated features           Pearson Correlation Score  chosen feature of a high mutual information value
                   [ radius mean,area worst]                     0.947
   1            [ radius mean,perimeter worst]                   0.967                               perimeter worst
                  [ radius mean,radius worst]                    0.971
   2               [ radius mean,area mean]                      0.989                                texture worst
   3            [ radius mean,perimeter mean]                    0.998                            concave points mean
   4                [ area mean,area worst]                      0.962
   5             [ area mean,perimeter worst]                    0.961                                    area se
                   [ area mean,radius worst]                     0.965                              concavity worst
                   [ radius worst,area worst]                    0.986
                [ radius worst,perimeter worst]                  0.994
                 [ perimeter worst,area worst]                   0.980
                 [ perimeter mean,area worst]                    0.947
             [ perimeter mean,perimeter worst]                   0.972
                [ perimeter mean,radius worst]                   0.970
                 [ perimeter mean,area mean]                     0.988
                 [ texture mean,texture worst]                   0.907
            [ compactness mean,concavity mean]                   0.892
          [ concavity mean,concave points mean]                  0.930
        [concave points mean,concave points worst]               0.913
                                                                 0.956
                       [ radius se,area se]                      0.970
                    [ radius se,perimeter se]                    0.938
                     [ perimeter se,area se]                     0.896
           [ compactness worst,concavity worst]

    •train–test–split (training=0.8,testing=0.2) k–fold cross     involved in the classification job. This approach is appealing
validation (k=10)                                                 because, compared with learning a nonlinear surface, the cost
                                                                  of moving to kernel space is minimal [24].
    2) Classification Models: In this part, the ML models that
are used in this study will be explained and clarified.               Decision Tree: A DT is one of the most important models
                                                                  in decision-making processes, as it is widely used in the field
    Logistic Regression: A statistical model known as LR          of ML. The trees are built from top to bottom, and nodes
uses a qualitative dependent variable that can only use discrete  of these trees representing features are selected based on a
values to represent the connection between two independent        certain scale (information gain in this study). In each node of
variables. It is used to investigate the influence of predictor   the tree, a specific decision is made, and this decision directs
variables on categorical outcomes. In an epidemiologic study,     you to another level of the tree until the root node, which is
logistic models are frequently used to analyse the connections    the source of the decision, is reached[25].
between risk factors and the development of the disease. In
medical publications that do not specialize in epidemiology           Voting Classifier: It is a type of ensemble classifier that
and public health, these models are often utilised [23].          depends on AI models, where it works to combine a certain
                                                                  set of models to produce one model that carries the strength
    Support Vector Machine: When learning the parameters          of the models that have been combined, which gives the best
of the SVM model during the training phase, SVM, one of           prediction accuracy [26]. Here, we use a soft voting classifier
the most significant and potent ML models, needs access to        and input three ML models (LR, SVM and DT), which are
all of the training data. Support vectors, a subset of these      considered the best models that work with a voting classifier
training examples, are the only ones on which SVM relies to       on this dataset based on a set of experiments. This classifier
make predictions in the future. The hyperplanes’ margins are      works on a probabilistic basis, as each of the input models of
determined by support vectors. Finding the greatest number        the classifier produces a probability value for class 0 and class
of hyperplanes that may be used to divide two classes is the      1. In the final result, the soft voting classifier uses the highest
major goal of the training phase. When an issue is not linearly   probability rate of all the input models, as shown in Fig.6.
separable in the input space, a kernel can transfer the data      Finally, we can summarize the proposed methodology as the
into a higher-dimensional space called kernel space, where        following. Firstly, we carry out some preliminary treatments
the data will be linearly separable. Linear hyperplane can        for improving the dataset. Secondly, the best features are
be obtained in the kernel space to divide the several classes
   48   49   50   51   52   53   54   55   56   57   58