Page 47 - 2023-Vol19-Issue2
P. 47
Received: 7 December 2022 | Revised: 4 January 2023 | Accepted: 10 January 2023
DOI: 10.37917/ijeee.19.2.6 Vol. 19 | Issue 2 | December 2023
Open Access
Iraqi Journal for Electrical and Electronic Engineering
Original Article
Using Pearson Correlation and Mutual Information
(PC-MI) to Select Features for Accurate Breast Cancer
Diagnosis Based on a Soft Voting Classifier
Mohammed S. Hashim*, Ali A. Yassin
Department of Computer science - Education College for Pure Sciences, University of Basrah, Basrah, 61004, Iraq
Correspondance
*Mohammed S. Hashim
Department of Computer science,
Education College for Pure Sciences,
University of Basrah, Basrah, Iraq
Email: moh.salah@uobasrah.edu.iq
Abstract
Breast cancer is one of the most critical diseases suffered by many people around the world, making it the most common
medical risk they will face. This disease is considered the leading cause of death around the world, and early detection is
difficult. In the field of healthcare, where early diagnosis based on machine learning (ML) helps save patients’ lives
from the risks of diseases, better-performing diagnostic procedures are crucial. ML models have been used to improve
the effectiveness of early diagnosis. In this paper, we proposed a new feature selection method that combines two filter
methods, Pearson correlation and mutual information (PC-MI), to analyse the correlation amongst features and then
select important features before passing them to a classification model. Our method is capable of early breast cancer
prediction and depends on a soft voting classifier that combines a certain set of ML models (decision tree, logistic
regression and support vector machine) to produce one model that carries the strengths of the models that have been
combined, yielding the best prediction accuracy. Our work is evaluated by using the Wisconsin Diagnostic Breast Cancer
datasets. The proposed methodology outperforms previous work, achieving 99.3% accuracy, an F1 score of 0.9922, a
recall of 0.9846, a precision of 1 and an AUC of 0.9923. Furthermore, the accuracy of 10-fold cross-validation is 98.2%.
Keywords
Breast Cancer, Feature Selection, Soft Voting Classifier, Cross-Validation.
I. INTRODUCTION 90%, helps increase survival rates [1]. Given that computers
and other technologies are used to be able to learn, identify
Breast cancer is one of the most well-known and common and diagnose the disease effectively and to provide treatment
diseases in the world, and its prevalence has been steadily recommendations based on the data gathered from the patient,
rising in recent years. Women are the most likely to suffer artificial intelligence (AI) and machine learning (ML) assist
from breast cancer, as 685,000 deaths and 2.3 million infec- clinicians in the early identification of breast cancer [2]. In the
tions have been discovered, according to the World Health medical field, ML algorithms for classification and prediction
Organization reports. This cancer manifests as a lump in the are frequently utilised[3], particularly on datasets related to
breast that can either be benign or malignant and, in the latter breast cancer, to determine if a tumour is benign or malignant
case, spread to other parts of the body. Breast cancer risk is [4].
significantly influenced by genetic mutations [1].
Many studies (details in the related work section) have
The adoption of early detection methods, which aid in the been conducted in the field of early diagnosis of breast can-
treatment of this tumour and raise the likelihood of survival by
This is an open-access article under the terms of the Creative Commons Attribution License,
which permits use, distribution, and reproduction in any medium, provided the original work is properly cited.
©2023 The Authors.
Published by Iraqi Journal for Electrical and Electronic Engineering | College of Engineering, University of Basrah.
https://doi.org/10.37917/ijeee.19.2.6 |https://www.ijeee.edu.iq 43