COVID-19 emerged in 2019 in china, the worldwide spread rapidly, and caused many injuries and deaths among humans. Accurate and early detection of COVID-19 can ensure the long-term survival of patients and help prohibit the spread of the epidemic. COVID-19 case classification techniques help health organizations quickly identify and treat severe cases. Algorithms of classification are one the essential matters for forecasting and making decisions to assist the diagnosis, early identification of COVID-19, and specify cases that require to intensive care unit to deliver the treatment at appropriate timing. This paper is intended to compare algorithms of classification of machine learning to diagnose COVID-19 cases and measure their performance with many metrics, and measure mislabeling (false-positive and false-negative) to specify the best algorithms for speed and accuracy diagnosis. In this paper, we focus onto classify the cases of COVID-19 using the algorithms of machine learning. we load the dataset and perform dataset preparation, pre-processing, analysis of data, selection of features, split of data, and use of classification algorithm. In the first using four classification algorithms, (Stochastic Gradient Descent, Logistic Regression, Random Forest, Naive Bayes), the outcome of algorithms accuracy respectively was 99.61%, 94.82% ,98.37%,96.57%, and the result of execution time for algorithms respectively were 0.01s, 0.7s, 0.20s, 0.04. The Stochastic Gradient Descent of mislabeling was better. Second, using four classification algorithms, (eXtreme-Gradient Boosting, Decision Tree, Support Vector Machines, K_Nearest Neighbors), the outcome of algorithms accuracy was 98.37%, 99%, 97%, 88.4%, and the result of execution time for algorithms respectively were 0.18s, 0.02s, 0.3s, 0.01s. The Decision Tree of mislabeling was better. Using machine learning helps improve allocate medical resources to maximize their utilization. Classification algorithm of clinical data for confirmed COVID-19 cases can help predict a patient's need to advance to the ICU or not need by using a global dataset of COVID-19 cases due to its accuracy and quality.
Low-quality data can be dangerous for the machine learning models, especially in crucial situations. Some large-scale datasets have low-quality data and false labels, also, datasets with images type probably have artifacts and biases from measurement errors. So, automatic algorithms that are able to recognize low-quality data are needed. In this paper, Shapley Value is used, a metric for evaluation of data, to quantify the value of training data to the performance of a classification algorithm in a large ImageNet dataset. We specify the success of data Shapley in recognizing low-quality against precious data for classification. We figure out that model performance is increased when low Shapley values are removed, whilst classification model performance is declined when high Shapley values are removed. Moreover, there were more true labels in high-Shapley value data and more mislabeled samples in low-Shapley value. Results represent that mislabeled or poor-quality images are in low Shapley value and valuable data for classification are in high Shapley value.