Cover
Vol. 17 No. 1 (2021)

Published: June 30, 2021

Pages: 76-87

Review Article

Local and Global Outlier Detection Algorithms in Unsupervised Approach: A Review

Abstract

The problem of outlier detection is one of the most important issues in the field of analysis due to its applicability in several famous problem domains, including intrusion detection, security, banks, fraud detection, and discovery of criminal activities in electronic commerce. Anomaly detection comprises two main approaches: supervised and unsupervised approach. The supervised approach requires pre-defined information, which is defined as the type of outliers, and is difficult to be defined in some applications. Meanwhile, the second approach determines the outliers without human interaction. A review of the unsupervised approach, which shows the main advantages and the limitations considering the studies performed in the supervised approach, is introduced in this paper. This study indicated that the unsupervised approach suffers from determining local and global outlier objects simultaneously as the main problem related to algorithm parameterization. Moreover, most algorithms do not rank or identify the degree of being an outlier or normal objects and required different parameter settings by the research. Examples of such parameters are the radius of neighborhood, number of neighbors within the radius, and number of clusters. A comprehensive and structured overview of a large set of interesting outlier algorithms, which emphasized the outlier detection limitation in the unsupervised approach, can be used as a guideline for researchers who are interested in this field.

References

  1. A. M. Jabbar, K. R. Ku-Mahamud, and R. Sagban, “An improved ACS algorithm for data clustering,” Indones. J. Electr. Eng. Comput. Sci., vol. 17, no. 3, pp. 1506–1515, 2019, doi: 10.11591/ijeecs.v17.i3.pp1506-1515.
  2. A. M. Jabbar, K. R. Ku-Mahamud, and R. Sagban, “Balancing Exploration And Exploitation In Acs Algorithms For Data Clustering,” vol. 97, no. 16, pp. 4320–4333, 2019.
  3. H. N. K. AL-Behadili, K. R. Ku-Mahamud, and R. Sagban, “Hybrid ant colony optimization and genetic algorithm for rule induction,” J. Comput. Sci., vol. 16, no. 7, pp. 1019–1028, 2020, doi: 10.3844/JCSSP.2020.1019.1028.
  4. H. N. K. Al-Behadili, R. Sagban, and K. R. Ku- Mahamud, “Adaptive parameter control strategy for ant- miner classification algorithm,” Indones. J. Electr. Eng. Informatics, vol. 8, no. 1, pp. 149–162, 2020, doi: 10.11591/ijeei.v8i1.1423.
  5. H. Almazini and K. R. Ku-Mahamud, “Grey Wolf Optimization Parameter Control for Feature Selection in Anomaly Detection,” Int. J. Intell. Eng. Syst., vol. 14, no. 2, pp. 474–483, 2021, doi: 10.22266/ijies2021.0430.43.
  6. H. Al-Behadili, “Stochastic Local Search Algorithms for Feature Selection: A Review,” Iraqi J. Electr. Electron. Eng., vol. 17, no. 1, pp. 1–10, 2021, doi: 10.37917/ijeee.17.1.1.
  7. F. W. Young, P. M. Valero-Mora, and M. Friendly, Visual Statistics: Seeing Data with Dynamic Interactive Graphics. 2011.
  8. J. A. S. Almeida, L. M. S. Barbosa, A. A. C. C. Pais, and S. J. Formosinho, “Improving hierarchical cluster analysis: A new method with outlier detection and automatic clustering,” Chemom. Intell. Lab. Syst., vol. 87, no. 2, pp. 208–217, 2007.
  9. Mansoori and Eghbal, “GACH: A grid-based algorithm for hierarchical clustering of high-dimensional data,” Soft Comput., vol. 18, no. 5, 2014.
  10. K. Singh and S. Upadhyaya, “Outlier Detection: Applications And Techniques.,” Int. J. Comput. …, vol. 9, no. 1, pp. 307–323, 2012.
  11. Y. Zhang, N. Meratnia, and P. Havinga, “A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets,” Computer (Long. Beach. Calif)., vol. 49, no. 3, pp. 355–363, 2007.
  12. Zhang, “Advancements of Outlier Detection : A Survey,” ICST Trans. Scalable Inf. Syst., vol. 13, no. 01, pp. 1–26, 2013.
  13. J. Ha, S. Seok, and J. S. Lee, “Robust outlier detection using the instability factor,” Knowledge-Based Syst., pp. 15–23, 2014.
  14. H. N. K. Al-Behadili, K. R. Ku-Mahamud, and R. Sagban, “Ant colony optimization algorithm for rule- based classification: Issues and potential solutions,” J. Theor. Appl. Inf. Technol., vol. 96, no. 21, pp. 7139–7150, 2018.
  15. S. S. Rakhe and A. S. Vaidya, “A Survey on Different Unsupervised Techniques to Detect Outliers,” nternational Res. J. Eng. Technol., pp. 514–519, 2015.
  16. M. Gupta, J. Gao, and C. C. Aggarwal, “Outlier Detection for Temporal Data : A Survey,” IEEE Trans. Knowl. Data Eng., vol. 25, no. 1, pp. 1–20, 2013.
  17. V. J. Hodge and J. I. M. Austin, “A Survey of Outlier Detection Methodologies,” no. 1969, pp. 85–126, 2004.
  18. A. M. Jabbar, K. R. Ku-Mahamud, and R. Sagban, “Ant-based sorting and ACO-based clustering approaches: A review,” in IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE), Apr. 2018, pp. 217– 223.
  19. N. Shahid, I. H. Naqvi, and S. Bin Qaisar, “Characteristics and classification of outlier detection A. Jabbar techniques for wireless sensor networks in harsh environments: a survey,” Artif. Intell. Rev., vol. 43, no. 2, pp. 193–228, 2012.
  20. M. Nayak and P. Dash, “Distance-based and Density- based Algorithm for Outlier Detection on Time Series Data,” Appl. Sci. Adv. Mater. Int., pp. 139 – 146, 2016.
  21. V. Chandola, A. Banerjee, and V. Kumar, “Anomaly Detection : A Survey,” ACM Comput., vol. 41, no. 3, pp. 1–58, 2009.
  22. V. Barnett, “The Ordering of Multivariate Data,” J. R. Stat. Soc., vol. 139, no. 3, pp. 318–355, 1976.
  23. J. Laurikkala, M. Juhola, and E. Kentala, “Informal identification of outliers in medical data,” Fifth Int. Work. Intell. Data Anal. Med. Pharmacol., pp. 20–24, 2000.
  24. H. E. Solberg and A. Lahti, “Detection of Outliers in Reference Distributions : Performance of Horn ’ s Algorithm,” Gen. Clin. Chem., pp. 1–7, 2005.
  25. F. E. Grubbs, “Procedures for Detecting Outlying Observations in Samples,” Technometrics, vol. 11, no. 1, pp. 1–21, 1969.
  26. P. S. Horn, L. Feng, Y. Li, and A. J. Pesce, “Effect of Outliers and Nonhealthy Individuals on Reference Interval Estimation,” Dep. Math. Sci. Univ. Cincinnati, Cincinnatinati, vol. 2145, pp. 2137–2145, 2001.
  27. G. Tang, “New methods in outlier detection,” Simon fraser university, 2015.
  28. D. Anderson, T. Frivold, A. Tamaru, and A. Valdes, “Next-generation intrusion detection expert system (nides), software users manual, beta-update release,” Tech. Rep., 1994.
  29. H. S. Javitz and A. Valdes, “The SRI IDES statistical anomaly detector,” Proceedings. 1991 IEEE Comput. Soc. Symp. Res. Secur. Priv., pp. 316–326, 1991.
  30. E. Eskin, “Anomaly detection over noisy data using learned probability distributions,” Seventeenth Int. Conf. Mach. Learn. Proc., pp. 255–262, 2000.
  31. S. Upadhyay and K. Singh, “Classification Based Outlier Detection Techniques,” Int. J. Comput. Trends Technol., vol. 3, no. 2, pp. 294–298, 2012.
  32. I. Steinwart, D. Gov, C. Scovel, and J. Gov, “A Classification Framework for Anomaly Detection Don Hush,” J. Mach. Learn. Res., vol. 6, pp. 211–232, 2005.
  33. P. Sykacek, “Equivalent Error Bars For Neural Network Classifiers Trained By Bayesian Inference,” … Eur. Symp. Artif. Neural …, pp. 1–7, 1997.
  34. S. Agrawal and J. Agrawal, “Survey on Anomaly Detection using Data Mining Techniques,” Procedia - Procedia Comput. Sci., vol. 60, pp. 708–713, 2015.
  35. E. M. Knox and R. T. Ng, “Algorithms for Mining Datasets Outliers in Large,” Proc. 24th Int. Conf. Very Large Data Bases, pp. 392–403, 1998.
  36. S. Ramaswamy, R. Rastogi, and K. Shim, “Efficient algorithms for mining outliers from large data sets,” in Proceedings of the 2000 ACM SIGMOD international conference on Management of data, 2000, pp. 427–438.
  37. G. H. Orair, C. H. C. Teixeira, W. M. Jr, B. Horizonte, and Y. Wang, “Distance-Based Outlier Detection : Consolidation and Renewed Bearing,” Proc. VLDB Endow., vol. 3, no. 2, 2010.
  38. Y. Li and H. Kitagawa, “DB-Outlier Detection by Example in High Dimensional Datasets,” IEEE Commun., pp. 73–78, 2007.
  39. J. Ha, S. Seok, and J. Lee, “A precise ranking method for outlier detection,” Inf. Sci. (Ny)., vol. 324, pp. 88–107, 2015.
  40. M. A. F. Pimentel, D. A. Clifton, L. Clifton, and L. Tarassenko, “A review of novelty detection,” Signal Processing, pp. 215–249, 2014.
  41. M. M. Breunig, H. Kriegel, R. T. Ng, and J. Sander, “LOF : Identifying Density-Based Local Outliers,” ACM SIGMOD Int. Conf. Manag. Data, pp. 93–104, 2000.
  42. J. Tang, Z. Chen, A. W. Fu, and D. W. Cheung, “Enhancing effectiveness of outlier detections for low density patterns,” Adv. Knowl. Discov. Data Min., pp. 535–548, 2002.
  43. A. Nowak-Brzezinska and C. Horyn, “Outliers in rules - The comparision of LOF, COF and KMEANS algorithms.,” Procedia Comput. Sci., vol. 176, pp. 1420– 1429, 2020, doi: 10.1016/j.procs.2020.09.152.
  44. W. Jin, A. K. H. Tung, J. Han, and W. Wang, “Ranking Outliers Using Symmetric Neighborhood Relationship,” Springer-Verlag Berlin Heidelb., pp. 577–593, 2006.
  45. J. Huang, Q. Zhu, L. Yang, and J. Feng, “A non- parameter outlier detection algorithm based on Natural Neighbor,” Knowledge-Based Syst., pp. 71–77, 2016.
  46. H. Kriegel, E. Schubert, and A. Zimek, “LoOP: Local Outlier Probabilities,” IEEE Commun., pp. 1649–1652, 2009.
  47. M. Goldstein and S. Uchida, “A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data,” PLoS One, no. April, pp. 1–31, 2016.
  48. J. Liu and G. Wang, “Outlier detection based on local minima density,” IEEE Commun., 2016.
  49. H. Huang, K. Mehrotra, and C. K. Mohan, “Rank-Based Outlier Detection SYR-EECS-2011-07,” Electr. Eng. Comput. Sci. Tech. Reports, pp. 1–22, 2011.
  50. M. H. Marghny and A. I. Taloba, “Outlier Detection using Improved Genetic K-means,” vol. 28, no. 11, pp. 33– 36, 2011.
  51. N. Faraidah, M. Di, and S. Z. Satari, “algorithm for circular regression model The Effect of Different Distance Measures in Detecting Outliers using Clustering-based Algorithm for Circular Regression Model,” 3rd ISM Int. Stat. Conf., pp. 1–13, 2017.
  52. M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” Proc. 2nd Int. Conf. Knowl. Discov. Data Min., pp. 226–231, 1996.
  53. R. T. Ng and J. Han, “Efficient and Effective Clustering Methods for Spatial Data Mining,” Proc. 20th Int. Conf. Very Large Data Bases, pp. 144–155, 1994.
  54. G. Karypis, E.-H. Han, and V. Kumar, “Chameleon: hierarchical clustering using dynamic modeling,” Computer (Long. Beach. Calif)., vol. 32, no. 8, pp. 68–75, 1999.
  55. T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An Efficient Data Clustering Databases Method for Very Large,” ACM SIGMOD Int. Conf. Manag. Data, vol. 1, pp. 103–114, 1996. A. Jabbar | 87
  56. S. Guha, R. Rastogi, and K. Shim, “CURE: an efficient clustering algorithm for large databases,” in Proceedings of the 1998 ACM SIGMOD international conference on Management of data, 1998, pp. 73–84.
  57. R. Pamula, J. K. Deka, and S. Nandi, “An Outlier Detection Method Based on Clustering An Outlier Detection Method based on Clustering,” Second Int. Conf. Emerg. Appl. Inf. Technol., pp. 253–256, 2011.
  58. Z. He, X. Xu, and S. Deng, “Discovering cluster-based local outliers,” Pattern Recognit. Lett., pp. 1641–1650, 2003.
  59. M. Amer and M. Goldstein, “Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner,” Proc. 3rd RapidMiner Community Meet. Conferernce (RCOMM 2012), pp. 1–12, 2012.
  60. X. Wang, M. Bai, D. Shen, T. Nie, Y. Kou, and G. Yu, “A Distributed Algorithm for the Cluster-Based Outlier Detection Using Unsupervised Extreme Learning Machines,” Hindawi Math. Probl. Eng., pp. 1–12, 2017.
  61. B. Liangi, “A Hierarchical Clustering Based Global Outlier Detection Method,” IEEE, pp. 1213–1215, 2010.
  62. B. Anwesha and L. Dey, “Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering,” World J. Comput. Appl. Technol. 5(2), vol. 5, no. 2, pp. 24–29, 2017.
  63. R. J. G. B. Campello and C. Sciences, “Hierarchical Density Estimates for Data Clustering , Visualization , and Outlier Detection,” ACM Trans. Knowl. Discov, vol. 10, no. 1, pp. 1–51, 2015.
  64. X. Wang, X. Wang, and M. Wilkes, New Developments in Unsupervised Outlier Detection. 2021.
  65. A. Diez-Olivan, J. A. Pagan, R. Sanz, and B. Sierra, “Data-driven prognostics using a combination of constrained K-means clustering, fuzzy modeling and LOF- based score,” Neurocomputing, vol. 241, pp. 97–107, 2017, doi: 10.1016/j.neucom.2017.02.024.
  66. G. Gan and M. K. P. Ng, “K-Means Clustering With Outlier Removal,” Pattern Recognit. Lett., vol. 90, pp. 8– 14, 2017, doi: 10.1016/j.patrec.2017.03.008.
  67. S. Zhao, W. Li, and J. Cao, “A user-adaptive algorithm for activity recognition based on K-means clustering, local outlier factor, and multivariate gaussian distribution,” Sensors (Switzerland), vol. 18, no. 6, 2018, doi: 10.3390/s18061850.
  68. S. Abghari, V. Boeva, N. Lavesson, H. Grahn, S. Ickin, and J. Gustafsson, “A Minimum Spanning Tree Clustering Approach for Outlier Detection in Event Sequences,” Proc. - 17th IEEE Int. Conf. Mach. Learn. Appl. ICMLA 2018, pp. 1123–1130, 2019, doi: 10.1109/ICMLA.2018.00182.
  69. S. Z. Satari, N. Faraidah, M. Di, and R. Zakaria, “The multiple outliers detection using agglomerative hierarchical methods in circular regression model The multiple outliers detection using agglomerative hierarchical methods in circular regression model,” J. Phys. Conf. Ser., pp. 1–5, 2017.
  70. S. J. Peter, “Minimum spanning tree based clustering for outlier detection,” J. Discret. Math. Sci. Cryptogr., vol. 14, no. 2, pp. 149–166, 2011, doi: 10.1080/09720529.2011.10698329.