Page 144 - IJEEE-2022-Vol18-ISSUE-1
P. 144

140 |                                                                                                                Atiyah & Thalij

Unit (ICU) or wouldn't require one. also comparing the          eXtreme-Gradient Boosting (XGBoost), and K-Nearest
performance of algorithms in false-positive and false-          Neighbors (KNN).
negative mislabeling, to select the best ones in accuracy and
speed of prognosis to aid the doctors in recognizing the                                Preparation of dataset
COVID-19 and avoid the mistake.
                                                                Pre-processing of dataset
                        II. RELATED WORK
                                                                Scaling of Feature  Analysis of data
COVID-19 is related to the fast evolution of data basics, so
there is a need to analyze the relations and hierarchy of the                       data splitting
data utilizing the ML algorithms to aid the health system in
the diagnosis of COVID-19, [6].This paper provides recent       Algorithms of classification
studies analysis of this area:
Sarwar et al. [7], diagnose diabetes using the algorithms of                             Performance Measurement
machine learning, the outcome referred to assured accuracy
of 98.60%. These can be helpful to forecast COVID-19, the                       Fig.1. Show the Method Diagram
exact identification of COVID-19 can rescue many the
people, the output enormous of data to train algorithms. ML       A. Preparation of Dataset
is possible to offer beneficial entries in this area, of        We have gotten the dataset from a search of the dataset in the
performing prognoses based on Images, radiography, clinical     google engine is a repository open_source that contains the
text, etc.                                                      most suitable information and details of COVID-19, the file
Iweendi et al. [8], presented a Fine-Tuned RF model, as well    format is the dataset of xlxl which involve 1925 columns and
as the adaboost model. To predict the possible result, the      231 rows[13].
system uses the spatial, demographic, and health details of
COVID-19 patients, the results were an F1-Score of 0.86, an       B. Pre-processing of Dataset
accuracy rate is 94%. A review of data refers to a strong       The dataset used in this work contains about 1925 instances
relationship between the state of death and patient gender,     and 231 features, the dataset should be improved with a
patient majority are among 20-70 ages.                          better form to process the data to the consistency
Bayat et al. [9], presented the system to anticipate COVID-     requirements, before implementing the model. preprocessing
19 depending on testing in a standardized lab. A massive        has two main stages: processing the missing values and
dataset containing 75,991 infections was gained from US         encoding the data for classification.
Veterans-Affairs, utilized XGB to create the model, the
outcomes were 86.4% of accuracy, 86.8% of specificity, and        C. Analysis of Data
82.4% of sensitivity. This work found the privileges of the     It is an operation modeling the data, examining, and
top (10) are of downward significance.                          imagining to extract helpful information and knowledge to
Zhou et al. [10], presented a system to anticipate the disease  make conclusions of performing an important role in
seriousness of COVID-19 infections. they used a dataset         decision-making.
containing 377 infections (172 are seriousness, 106 are non-
seriousness) from one of the china's-hospitals, the Logistic      D. Scaling of Feature
Regression was utilized to create the forecasting system, the   The large-scale dimensions and the discrepancy of entries in
results were 87.9% of AUC, and 88.6% of sensitivity,            the dataset make it challenging to find the data. So the
73.7% of specificity The outcomes were existed three            dimensions of values should be compatible in the dataset to
separate elements linked strongly to COVID-19 infections:       get an efficient model and computation speeding up in the
C_reactive proteins, age, and d-dime.

                       III. METHODOLOGY

The main stage of the methodology utilized in this study is
displayed in figure 1. Specificity, sensitivity, accuracy,
precision, ROC_AUC_Score, the positive and negative
prevalence, mislabeling and execution time are used to
measure performance. we used python to process the
outcomes to create a classification system (including
preparation of dataset, performing the pre-processing,
analysis of data, scaling of features, split of data, and
algorithm of classification), in the first model [11], we use
algorithms such as Stochastic Gradient Descent (SGD),
Naïve-Bayes (BN), Logistic-Regression (LR), and Random
Forest (RF). Second model [12], we use algorithms like
Support-Vector Machines (SVM), Decision-Tree (DT),
   139   140   141   142   143   144   145   146   147   148