Page 47 - 2023-Vol19-Issue2
P. 47

Received: 7 December 2022 | Revised: 4 January 2023 | Accepted: 10 January 2023

DOI: 10.37917/ijeee.19.2.6                                        Vol. 19 | Issue 2 | December 2023

                                                                                 Open Access

Iraqi Journal for Electrical and Electronic Engineering

Original Article

   Using Pearson Correlation and Mutual Information

 (PC-MI) to Select Features for Accurate Breast Cancer

         Diagnosis Based on a Soft Voting Classifier

                                                           Mohammed S. Hashim*, Ali A. Yassin
              Department of Computer science - Education College for Pure Sciences, University of Basrah, Basrah, 61004, Iraq

Correspondance
*Mohammed S. Hashim
Department of Computer science,
Education College for Pure Sciences,
University of Basrah, Basrah, Iraq
Email: moh.salah@uobasrah.edu.iq

  Abstract
  Breast cancer is one of the most critical diseases suffered by many people around the world, making it the most common
  medical risk they will face. This disease is considered the leading cause of death around the world, and early detection is
  difficult. In the field of healthcare, where early diagnosis based on machine learning (ML) helps save patients’ lives
  from the risks of diseases, better-performing diagnostic procedures are crucial. ML models have been used to improve
  the effectiveness of early diagnosis. In this paper, we proposed a new feature selection method that combines two filter
  methods, Pearson correlation and mutual information (PC-MI), to analyse the correlation amongst features and then
  select important features before passing them to a classification model. Our method is capable of early breast cancer
  prediction and depends on a soft voting classifier that combines a certain set of ML models (decision tree, logistic
  regression and support vector machine) to produce one model that carries the strengths of the models that have been
  combined, yielding the best prediction accuracy. Our work is evaluated by using the Wisconsin Diagnostic Breast Cancer
  datasets. The proposed methodology outperforms previous work, achieving 99.3% accuracy, an F1 score of 0.9922, a
  recall of 0.9846, a precision of 1 and an AUC of 0.9923. Furthermore, the accuracy of 10-fold cross-validation is 98.2%.

  Keywords
  Breast Cancer, Feature Selection, Soft Voting Classifier, Cross-Validation.

                  I. INTRODUCTION                                 90%, helps increase survival rates [1]. Given that computers
                                                                  and other technologies are used to be able to learn, identify
   Breast cancer is one of the most well-known and common         and diagnose the disease effectively and to provide treatment
diseases in the world, and its prevalence has been steadily       recommendations based on the data gathered from the patient,
rising in recent years. Women are the most likely to suffer       artificial intelligence (AI) and machine learning (ML) assist
from breast cancer, as 685,000 deaths and 2.3 million infec-      clinicians in the early identification of breast cancer [2]. In the
tions have been discovered, according to the World Health         medical field, ML algorithms for classification and prediction
Organization reports. This cancer manifests as a lump in the      are frequently utilised[3], particularly on datasets related to
breast that can either be benign or malignant and, in the latter  breast cancer, to determine if a tumour is benign or malignant
case, spread to other parts of the body. Breast cancer risk is    [4].
significantly influenced by genetic mutations [1].
                                                                      Many studies (details in the related work section) have
    The adoption of early detection methods, which aid in the     been conducted in the field of early diagnosis of breast can-
treatment of this tumour and raise the likelihood of survival by

This is an open-access article under the terms of the Creative Commons Attribution License,
which permits use, distribution, and reproduction in any medium, provided the original work is properly cited.
©2023 The Authors.
Published by Iraqi Journal for Electrical and Electronic Engineering | College of Engineering, University of Basrah.

https://doi.org/10.37917/ijeee.19.2.6                                            |https://www.ijeee.edu.iq 43
   42   43   44   45   46   47   48   49   50   51   52