Cover
Vol. 18 No. 2 (2022)

Published: December 31, 2022

Pages: 9-14

Original Article

Shapley Value is an Equitable Metric for Data Valuation

Abstract

Low-quality data can be dangerous for the machine learning models, especially in crucial situations. Some large-scale datasets have low-quality data and false labels, also, datasets with images type probably have artifacts and biases from measurement errors. So, automatic algorithms that are able to recognize low-quality data are needed. In this paper, Shapley Value is used, a metric for evaluation of data, to quantify the value of training data to the performance of a classification algorithm in a large ImageNet dataset. We specify the success of data Shapley in recognizing low-quality against precious data for classification. We figure out that model performance is increased when low Shapley values are removed, whilst classification model performance is declined when high Shapley values are removed. Moreover, there were more true labels in high-Shapley value data and more mislabeled samples in low-Shapley value. Results represent that mislabeled or poor-quality images are in low Shapley value and valuable data for classification are in high Shapley value.

References

  1. A. Rezvantalab, H. Safigholi, and S. Karimijeshni, “Dermatologist Level Dermoscopy Skin Cancer Classification Using Different Deep Learning Convolutional Neural Networks Algorithms,” arXiv:1810.10348.https://doi.org/10.48550/arXiv.1810.10 348
  2. P. Rajpurkar et al., “Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists,” PLoS Med, vol. 15, no. 11, p. e1002686, Nov. 2018.
  3. J. J. Titano et al., “Automated deep-neural-network surveillance of cranial images for acute neurologic events,” Nat Med, vol. 24, no. 9, pp. 1337–1341, Sep. 2018.
  4. N. Noori and A. Yassin, “Towards for Designing Intelligent Health Care System Based on Machine Learning,” IJEEE, vol. 17, no. 2, pp. 120–128, Dec. 2021.
  5. A. E. W. Johnson et al., “OPEN MIMIC-CXR, a de- identified Data Descriptor publicly available database of chest radiographs with free-text reports,” Scientific Data, p. 9.
  6. X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “ChestX-ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases,” p. 10.
  7. K. Yan, X. Wang, L. Lu, and R. M. Summers, “DeepLesion: Automated Deep Mining, Categorization and Detection of Significant Radiology Image Findings using Large-Scale Clinical Lesion Annotations,” arXiv:1710.01766 [cs], Oct. 2017, Accessed: Apr. 23, 2022.
  8. L. Oakden-Rayner, “Exploring large scale public medical image datasets,” arXiv:1907.12720 [cs, eess], Jul. 2019, Accessed: Apr. 23, 2022.
  9. M. J. Willemink et al., “Preparing Medical Imaging Data for Machine Learning,” Radiology, vol. 295, no. 1, pp. 4–15, Apr. 2020.
  10. O. Diaz et al., “Data preparation for artificial intelligence in medical imaging: A comprehensive guide to open-access platforms and tools,” Physica Medica, vol. 83, pp. 25–37, Mar. 2021.
  11. J. R. Zech, M. A. Badgeley, M. Liu, A. B. Costa, J. J. Titano, and E. K. Oermann, “Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study,” PLoS Med, vol. 15, no. 11, p. e1002683, Nov. 2018.
  12. S. Shobeiri and M. Aajami, “Shapley value in convolutional neural networks (CNNs): A Comparative Study,” American Journal of Science & Engineering, vol. 2, no. 3, pp. 9–14, Dec. 2021.
  13. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei- Fei, “ImageNet: A Large-Scale Hierarchical Image Database,” p. 8.
  14. C. Xue, Q. Dou, X. Shi, H. Chen, and P. A. Heng, “Robust Learning at Noisy Labeled Medical Images: Applied to Skin Lesion Classification,” arXiv:1901.07759 [cs], Jan. 2019, Accessed: Apr. 24, 2022.
  15. Y. Dgani, H. Greenspan, and J. Goldberger, “Training a neural network based on unreliable human annotation of medical images,” in 2018 IEEE 15th International Shobeiri & Aajami Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, Apr. 2018, pp. 39–42.
  16. R. D. Cook, “DETECTION OF INFLUENTIAL OBSERVATIONS IN LINEAR REGRESSION,” p. 13.
  17. V. M. Patro and M. Ranjan Patra, “Augmenting Weighted Average with Confusion Matrix to Enhance Classification Accuracy,” TMLAI, vol. 2, no. 4, Aug. 2014.
  18. S. García, A. Fernández, J. Luengo, and F. Herrera, “A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability,” Soft Comput, vol. 13, no. 10, pp. 959– 977, Aug. 2009.