Cover
Vol. 20 No. 1 (2024)

Published: June 30, 2024

Pages: 271-285

Original Article

A Comparative Evaluation of Initialization Strategies for K-Means Clustering with Swarm Intelligence Algorithms

Abstract

Clustering is a fundamental data analysis task that presents challenges. Choosing proper initialization centroid techniques is critical to the success of clustering algorithms, such as k-means. The current work investigates six established methods (random, Forgy, k-means++, PCA, hierarchical clustering, and naive sharding) and three innovative swarm intelligence-based approaches—Spider Monkey Optimization (SMO), Whale Optimization Algorithm (WOA) and Grey Wolf Optimizer (GWO)—for k-means clustering (SMOKM, WOAKM, and GWOKM). The results on ten well-known datasets strongly favor swarm intelligence-based techniques, with SMOKM consistently outperforming WOAKM and GWOKM. This finding provides critical insights into selecting and evaluating centroid techniques in k-means clustering. The current work is valuable because it provides guidance for those seeking optimal solutions for clustering diverse datasets. Swarm intelligence, especially SMOKM, effectively generates distinct and well-separated clusters, which is valuable in resource-constrained settings. The research also sheds light on the performance of traditional methods such as hierarchical clustering, PCA, and k-means++, which, while promising for specific datasets, consistently underperform swarm intelligence-based alternatives. In conclusion, the current work contributes essential insights into selecting and evaluating initialization centroid techniques for k-means clustering. It highlights the superiority of swarm intelligence, particularly SMOKM, and provides actionable guidance for addressing various clustering challenges.

References

  1. C. Yuan and H. Yang, “Research on k-value selection method of k-means clustering algorithm,” J — Multidis- ciplinary Scientific Journal, vol. 2, pp. 226–235, June 2019. https://doi.org/10.3390/j2020016.
  2. T. Kodinariya and D. P. R. Makwana, “Survey on exist- ing methods for selecting initial centroids in k-means clustering,” International Journal of Engineering Devel- opment and Research (IJEDR), vol. 2, no. 2, pp. 2865– 2868, 2014.
  3. S. Shukla and S. Naganna, “A review on k-means data clustering approach,” International Journal of Informa- tion & Computation Technology, vol. 4, no. 17, pp. 1847– 1860, 2014.
  4. J. M. Pena, J. A. Lozano, and P. Larranaga, “An empirical comparison of four initialization meth- ods for the k-means algorithm,” Pattern recogni- tion letters, vol. 20, no. 10, pp. 1027–1040, 1999. https://doi.org/10.1016/s0167-8655(99)00069-0.
  5. T. Su and J. Dy, “A deterministic method for initializing k-means clustering,” in 16th IEEE international confer- ence on tools with artificial intelligence, (Boca Raton, FL, USA), pp. 784–786, IEEE, 15-17 November 2004. https://doi.org/10.1109/ictai.2004.7.
  6. M. E. Celebi and H. A. Kingravi, “Deterministic ini- tialization of the k-means algorithm using hierarchical clustering,” International Journal of Pattern Recogni- tion and Artificial Intelligence, vol. 26, no. 07, pp. 1–25, 2012. https://doi.org/10.1142/s0218001412500188.
  7. M. E. Celebi, H. A. Kingravi, and P. A. Vela, “A comparative study of efficient initialization methods for the k-means clustering algorithm,” Expert systems with applications, vol. 40, no. 1, pp. 200–210, 2013. https://doi.org/10.1016/j.eswa.2012.07.021.
  8. R. T. Aldahdooh and W. Ashour, “Dimk-means” distance-based initialization method for k-means clus- tering algorithm”,” International Journal of Intelligent Systems and Applications, vol. 5, no. 2, pp. 41–51, 2013. https://doi.org/10.5815/ijisa.2013.02.05.
  9. B. Li, “An experiment of k-means initialization strate- gies on handwritten digits dataset,” Intelligent Infor- mation Management, vol. 10, no. 2, pp. 43–48, 2018. https://doi.org/10.4236/iim.2018.102003.
  10. V. P. Murugesan and P. Murugesan, “A new initialization and performance measure for the rough k-means cluster- ing,” Soft Computing, vol. 24, no. 15, pp. 11605–11619, 2020. https://doi.org/10.1007/s00500-019-04625-9.
  11. M. Ahmed, R. Seraj, and S. M. S. Islam, “The k-means algorithm: A comprehensive survey and performance evaluation,” Electronics, vol. 9, no. 8, pp. 1–12, 2020. https://doi.org/10.3390/electronics9081295.
  12. A. Vouros, S. Langdell, M. Croucher, and E. Vasi- laki, “An empirical comparison between stochastic and deterministic centroid initialisation for k-means vari- ations,” Machine Learning, vol. 110, pp. 1975–2003, 2021. https://doi.org/10.1007/s10994-021-06021-7.
  13. K. Chowdhury, D. Chaudhuri, and A. K. Pal, “An entropy-based initialization method of k-means clus- tering on the optimal number of clusters,” Neural Com- puting and Applications, vol. 33, pp. 6965–6982, 2021. https://doi.org/10.1007/s00521-020-05471-9.
  14. Z. Rahman, M. S. Hossain, M. Hasan, and A. Imteaj, “An enhanced method of initial clus- ter center selection for k-means algorithm,” in 2021 Innovations in Intelligent Systems and Appli- cations Conference (ASYU), pp. 1–6, IEEE, 2021. https://doi.org/10.1109/asyu52992.2021.9599017.
  15. A. Torrente and J. Romo, “Initializing k-means clus- tering by bootstrap and data depth,” Journal of Classification, vol. 38, no. 2, pp. 232–256, 2021. https://doi.org/10.1007/s00357-020-09372-3.
  16. S. Harris and R. C. De Amorim, “An extensive empirical comparison of k-means initialization algo- rithms,” IEEE Access, vol. 10, pp. 58752–58768, 2022. https://doi.org/10.1109/access.2022.3179803.
  17. S. F. Raheem and M. Alabbas, “Optimal k-means clustering using artificial bee colony algorithm with variable food sources length.,” International Jour- nal of Electrical & Computer Engineering (2088- 8708), vol. 12, no. 5, pp. 5435–5443, 2022. https://doi.org/10.11591/ijece.v12i5.pp5435-5443.
  18. A. Kazemi and G. Khodabandehlouie, “A new initialisation method for k-means algorithm in the clustering problem: data analysis,” Interna- tional Journal of Data Analysis Techniques and 285 | Obaid & Alabbas Strategies, vol. 10, no. 3, pp. 291–304, 2018. https://doi.org/10.1504/ijdats.2018.10015167.
  19. J. A. Alhijaj and R. S. Khudeyer, “Integration of effi- cientnetb0 and machine learning for fingerprint classi- fication,” Informatica, vol. 47, no. 5, p. 49–56, 2023. https://doi.org/10.31449/inf.v47i5.4724.
  20. G. S. Ohannesian and E. J. Harfash, “Epileptic seizures detection from eeg recordings based on a hybrid sys- tem of gaussian mixture model and random forest clas- sifier,” Informatica, vol. 46, no. 6, p. 105–116, 2022. https://doi.org/10.31449/inf.v46i6.4203.
  21. M. M. Mayo, “An arithmetic-based deterministic cen- troid initialization method for the k-means clustering algorithm,” 2016.
  22. S. Mirjalili, S. M. Mirjalili, and A. Lewis, “Grey wolf optimizer,” Advances in engi- neering software, vol. 69, pp. 46–61, 2014. https://doi.org/10.1016/j.advengsoft.2013.12.007.
  23. S. F. Raheem and M. Alabbas, “Dynamic artifi- cial bee colony algorithm with hybrid initialization method,” Informatica, vol. 45, no. 6, p. 103–114, 2021. https://doi.org/10.31449/inf.v45i6.3652.
  24. J. C. Bansal, H. Sharma, S. S. Jadon, and M. Clerc, “Spider monkey optimization algorithm for numerical optimization,” Memetic computing, vol. 6, pp. 31–47, 2014. https://doi.org/10.1007/s12293-013-0128-0.
  25. S. Mirjalili and A. Lewis, “The whale op- timization algorithm,” Advances in engi- neering software, vol. 95, pp. 51–67, 2016. https://doi.org/10.1016/j.advengsoft.2016.01.008.
  26. A. Q. Obaid and M. Alabbas, “Hybrid variable-length spider monkey optimization with good-point set initial- ization for data clustering,” Informatica, vol. 47, no. 8, p. 67–78, 2023. https://doi.org/10.31449/inf.v47i8.4872.