Page 51 - 2023-Vol19-Issue2
P. 51

47 |                                                                                                             Hashim & Yassin

tionship (correlation) between features. This scale measures        to dependent features. The MI value is computed by (2) [20]:
the degree of correlation between all features, where the value
of the relationship ranges between [-1,1]:                          (X  ,Y  )  =   n    m   P(Xi,      ).  log[  P(Xi|Yi)  ]  (2)
                                                                    ? ?I          i=1  j=1         Yi             P(Xi)
    Score (1): This value indicates that the correlation be-
tween the two features is completely directly proportional.             After applying (2) to the dataset used in this study, Table I
                                                                    shows each feature and its value in descending order.
    Score (0): This value denotes the absence of correlation
between the two features.                                           TABLE I.
                                                                    MUTUAL INFORMATION VALUE FOR EACH FEATURE
    Score (-1): This value indicates that the correlation be-
tween the two features is inversely proportional.                                Feature                           value
                                                                            perimeter worst                      0.499442
    The PC coefficient between two features can be measured                                                      0.490916
through (1) [19].                                                              area worst                        0.483495
                                                                              radius worst                       0.474548
r=    [N    N   xi.yi  -    N   xi  .    N   yi]               (1)       concave points mean                     0.471331
                                                                         concave points worst                    0.427724
          ?i=1            ?i=1         ?i=1                                 perimeter mean                       0.421748
                                                                            concavity mean                       0.407724
      [N ?iN=1 xi2 - (?Ni=1 xi)2].[N ?Ni=1 y2i - (?Ni=1 yi)2]                 radius mean                        0.405551
                                                                               area mean                         0.366311
    Fig. 4 displays the heat map of Pearson correlation scores                                                   0.358751
between WDBC features.                                                           area se                         0.284167
                                                                            concavity worst                      0.283761
                                                                                                                 0.277863
                                                                              perimeter se                       0.276540
                                                                          compactness worst                      0.178546
                                                                                                                 0.177446
                                                                                radius se                        0.145510
                                                                          compactness mean                       0.139945
                                                                                                                 0.120389
                                                                              concavity se                       0.119047
                                                                           concave points se                     0.108997
                                                                                                                 0.101290
                                                                              texture mean                       0.097076
                                                                              texture worst                      0.070514
                                                                           smoothness worst                      0.048942
                                                                            compactness se                       0.027308
                                                                           smoothness mean                       0.023849
                                                                            symmetry worst                       0.023746
                                                                        fractal dimension worst                  0.002271
                                                                            symmetry mean
                                                                         fractal dimension se
                                                                              symmetry se
                                                                        fractal dimension mean
                                                                             smoothness se

                                                                                texture se

 Fig. 4. Heat map of correlations between WDBC features.                3) Feature Selection Based on Pearson Correlation and
                                                                    Mutual Information (PC-MI): The filtering method is consid-
    2) Feature Selection Based on Mutual Information: Mu-           ered the best and least complicated and costly way to select
tual Information (MI) is a measure of the dependency between        the feature, because this method selects the feature based on
each feature and the target class. The importance of the cur-       correlation analysis and is separate from the ML model used
rent measure is to find the best features that are closely related  [21]. PC has been used to find the degree of relationship be-
to the goal. The resulting value ranges between [0,1], where        tween one feature and another in the dataset, and using MI
value (0) represents independent features, and value (1) refers     separately helps determine the degree of relationship of each
                                                                    feature to the target class. Therefore, these two methods will
   46   47   48   49   50   51   52   53   54   55   56