In today’s world, the data generated by many applications are increasing drastically, and finding an optimal subset of features from the data has become a crucial task. The main objective of this review is to analyze and comprehend different stochastic local search algorithms to find an optimal feature subset. Simulated annealing, tabu search, genetic programming, genetic algorithm, particle swarm optimization, artificial bee colony, grey wolf optimization, and bat algorithm, which have been used in feature selection, are discussed. This review also highlights the filter and wrapper approaches for feature selection. Furthermore, this review highlights the main components of stochastic local search algorithms, categorizes these algorithms in accordance with the type, and discusses the promising research directions for such algorithms in future research of feature selection.
Breast cancer is one of the most critical diseases suffered by many people around the world, making it the most common medical risk they will face. This disease is considered the leading cause of death around the world, and early detection is difficult. In the field of healthcare, where early diagnosis based on machine learning (ML) helps save patients’ lives from the risks of diseases, better-performing diagnostic procedures are crucial. ML models have been used to improve the effectiveness of early diagnosis. In this paper, we proposed a new feature selection method that combines two filter methods, Pearson correlation and mutual information (PC-MI), to analyse the correlation amongst features and then select important features before passing them to a classification model. Our method is capable of early breast cancer prediction and depends on a soft voting classifier that combines a certain set of ML models (decision tree, logistic regression and support vector machine) to produce one model that carries the strengths of the models that have been combined, yielding the best prediction accuracy. Our work is evaluated by using the Wisconsin Diagnostic Breast Cancer datasets. The proposed methodology outperforms previous work, achieving 99.3% accuracy, an F1 score of 0.9922, a recall of 0.9846, a precision of 1 and an AUC of 0.9923. Furthermore, the accuracy of 10-fold cross-validation is 98.2%.
Early in the 20th century, as a result of technological advancements, the importance of digital marketing significantly increased as the necessity for digital customer experience, promotion, and distribution emerged. Since the year 1988, in the case when the term ”Digital Marketing” first appeared, the business sector has undergone drastic growth, moving from small startups to massive corporations on a global scale. The marketer must navigate a chaotic environment caused by the vast volume of generated data. Decision-makers must contend with the fact that user data is dynamic and changes every day. Smart applications must be used within enterprises to better evaluate, classify, enhance, and target audiences. Customers who are tech-savvy are pushing businesses to make bigger financial investments and use cutting-edge technologies. It was only natural that marketing and trade could be one of the areas to move to such development, which helps to move to the speed of spread, advertisements, along with other things to facilitate things for reaching and winning customers. In this study, we utilized machine learning (ML) algorithms (Decision tree (DT), K-Nearest Neighbor (KNN), CatBoost, and Random Forest (RF) (for classifying data in customers to move to development. Improve the ability to forecast customer behavior so one can gain more business from them more quickly and easily. With the use of the aforementioned dataset, the suggested system was put to the test. The results show that the system can accurately predict if a customer will buy something or not; the random forest (RF) had an accuracy of 0.97, DT had an accuracy of 0. 95, KNN had an accuracy of 0. 91, while the CatBoost algorithm had the execution time 15.04 of seconds, and gave the best result of highest f1 score and accuracy (0.91, 0. 98) respectively. Finally, the study’s future goals involve being created a web page, thereby helping many banking institutions with speed and forecast accuracy. Using more techniques of feature selection in conjunction with the marketing dataset to improve diagnosis.