Page 48 - 2023-Vol19-Issue2
P. 48
44 | Hashim & Yassin
cer using AI models. Some of these studies employ ML II. RELATED WORK
algorithms (models) to determine if the tumour is benign or Given the pressing need for accurate diagnosis, healthcare
malignant. Some studies use artificial neural networks, and is one of the most significant fields in which AI has been em-
some use an ensemble classifier as a classifier that combines ployed. ML and deep learning algorithms are used in several
several models. tests on datasets related to breast cancer, and they produce
classification results with a high degree of accuracy. In this
As for the methods used in selecting features, many meth- part, we present some previous studies related to diagnosing
ods have been used in these studies, some of which are de- breast cancer using AI algorithms.
pendent on the filter method, and some are dependent on the
wrapper method. The wrapper method removes redundant Hazra et al. [6] used the Wisconsin Diagnostic Breast
features that affect the model learning process and lead to a Cancer (WDBC) dataset where they performed the feature
huge error in classification. selection process using the PC coefficient to obtain the least
number of features. These features were passed on to three
These studies are limited in terms of the accuracy of diag- models, namely, support vector machine, na¨ive Bayes and
nosis and prediction, and the reason for this is due to several ensemble classifiers to compare the results and achieve the
reasons, including the imbalance of the dataset, which leads to best model classifying the disease, where the results showed
the bias of the ML model to the majority side [5]. Moreover, that support vector machine with 19 features had the best
previous work were limited to the available feature selection accuracy of 98.51%.
methods, and they did not use a method that combines two
simple methods to yield the best results and lowest cost, as Khuriwal and Mishra [7] used the WDBC dataset in their
well as models each one separately to find the best classifier study and applied chi-square as a feature selection method
among them. to filter the dataset and keep the best features that diagnose
the type of tumour present. Only 16 features were selected,
The following is a summary of the study’s main contribu- and these features were passed to a voting classifier that in-
tions:
• To attain the best diagnosis accuracy, we adopted a cluded logistic regression (LR) and artificial neural network,
new feature selection method called PC-MI that com- which gave this classifier an accuracy score of 98.50%. Allam
bines two methods, namely, correlation analysis based and Nandhini [8] used binary teaching learning-based opti-
on Pearson correlation and feature selection based on misation, one of the wrapper methods for selecting the best
mutual information. features that represent a dataset. In their study, they used the
WDBC dataset to diagnose tumour type. Five classification
models have been applied: support vector machine (SVM),
• The dataset balancing process is performed using SMOTE discriminant analysis, decision tree (DT), k-nearest neigh-
to avoid bias of the ML model to a specific party.
bours (KNN) and Naive Bayes; amongst them, SVM gave the
• The soft voting classifier is used, which integrates three highest accuracy of 98.43% with nine features.
models into one model that carries the strength of these Memon et al. [9] applied recursive feature elimination
models.
(RFE) as a method for selecting the feature on the WDBC
dataset for diagnosing whether the breast cancer tumour is be-
• The impact of employing a soft voting classifier on nign or malignant. This method produced 18 features out of 30
prediction accuracy was analysed. features that were passed to the SVM model, which achieved
high specificity (99%), accuracy (99%) and sensitivity (98%).
• To aid physicians in the accuracy of the diagnosis, a Dhahri et al. [10] used genetic programming as a method to
web page was designed that diagnoses the type of breast select the best features from the WDBC dataset. This method
cancer tumour. resulted in extracting 12 features out of 30 features, where
several models were used to compare their performance on
The remaining portions of the paper are organised as fol- this dataset. These models were AdaBoost, LR, Gaussian
lows. In section II we present previous studies related to our Na¨ive Bayes, quadratic discriminant analysis, random forest,
work. Section III provides a detailed explanation of the pro- gradient boosting, SVM, linear discriminant analysis, KNN,
posed methodology for diagnosing the type of breast cancer DT and extra trees classifier. By contrast, the AdaBoost clas-
tumour. Section IV presents the results reached using the sifier obtained the highest accuracy relative to the others with
proposed methodology and discusses these results. Section V a rate of 98.24%.
introduces the conclusion.
Ibrahim et al. [11] used two methods of feature selec-
tion in their study, which are correlation analysis and princi-
pal component analysis, and wrapper methods to select the