Page 51 - 2023-Vol19-Issue2
P. 51
47 | Hashim & Yassin
tionship (correlation) between features. This scale measures to dependent features. The MI value is computed by (2) [20]:
the degree of correlation between all features, where the value
of the relationship ranges between [-1,1]: (X ,Y ) = n m P(Xi, ). log[ P(Xi|Yi) ] (2)
? ?I i=1 j=1 Yi P(Xi)
Score (1): This value indicates that the correlation be-
tween the two features is completely directly proportional. After applying (2) to the dataset used in this study, Table I
shows each feature and its value in descending order.
Score (0): This value denotes the absence of correlation
between the two features. TABLE I.
MUTUAL INFORMATION VALUE FOR EACH FEATURE
Score (-1): This value indicates that the correlation be-
tween the two features is inversely proportional. Feature value
perimeter worst 0.499442
The PC coefficient between two features can be measured 0.490916
through (1) [19]. area worst 0.483495
radius worst 0.474548
r= [N N xi.yi - N xi . N yi] (1) concave points mean 0.471331
concave points worst 0.427724
?i=1 ?i=1 ?i=1 perimeter mean 0.421748
concavity mean 0.407724
[N ?iN=1 xi2 - (?Ni=1 xi)2].[N ?Ni=1 y2i - (?Ni=1 yi)2] radius mean 0.405551
area mean 0.366311
Fig. 4 displays the heat map of Pearson correlation scores 0.358751
between WDBC features. area se 0.284167
concavity worst 0.283761
0.277863
perimeter se 0.276540
compactness worst 0.178546
0.177446
radius se 0.145510
compactness mean 0.139945
0.120389
concavity se 0.119047
concave points se 0.108997
0.101290
texture mean 0.097076
texture worst 0.070514
smoothness worst 0.048942
compactness se 0.027308
smoothness mean 0.023849
symmetry worst 0.023746
fractal dimension worst 0.002271
symmetry mean
fractal dimension se
symmetry se
fractal dimension mean
smoothness se
texture se
Fig. 4. Heat map of correlations between WDBC features. 3) Feature Selection Based on Pearson Correlation and
Mutual Information (PC-MI): The filtering method is consid-
2) Feature Selection Based on Mutual Information: Mu- ered the best and least complicated and costly way to select
tual Information (MI) is a measure of the dependency between the feature, because this method selects the feature based on
each feature and the target class. The importance of the cur- correlation analysis and is separate from the ML model used
rent measure is to find the best features that are closely related [21]. PC has been used to find the degree of relationship be-
to the goal. The resulting value ranges between [0,1], where tween one feature and another in the dataset, and using MI
value (0) represents independent features, and value (1) refers separately helps determine the degree of relationship of each
feature to the target class. Therefore, these two methods will