Recently, numerous researches have emphasized the importance of professional inspection and repair in case of suspected faults in Photovoltaic (PV) systems. By leveraging electrical and environmental features, many machine learning models can provide valuable insights into the operational status of PV systems. In this study, different machine learning models for PV fault detection using a simulated 0.25MW PV power system were developed and evaluated. The training and testing datasets encompassed normal operation and various fault scenarios, including string-to-string, on-string, and string-to-ground faults. Multiple electrical and environmental variables were measured and exploited as features, such as current, voltage, power, temperature, and irradiance. Four algorithms (Tree, LDA, SVM, and ANN) were tested using 5-fold cross-validation to identify errors in the PV system. The performance evaluation of the models revealed promising results, with all algorithms demonstrating high accuracy. The Tree and LDA algorithms exhibited the best performance, achieving accuracies of 99.544% on the training data and 98.058% on the testing data. LDA achieved perfect accuracy (100%) on the testing data, while SVM and ANN achieved 95.145% and 89.320% accuracy, respectively. These findings underscore the potential of machine learning algorithms in accurately detecting and classifying various types of PV faults. .
COVID-19 emerged in 2019 in china, the worldwide spread rapidly, and caused many injuries and deaths among humans. Accurate and early detection of COVID-19 can ensure the long-term survival of patients and help prohibit the spread of the epidemic. COVID-19 case classification techniques help health organizations quickly identify and treat severe cases. Algorithms of classification are one the essential matters for forecasting and making decisions to assist the diagnosis, early identification of COVID-19, and specify cases that require to intensive care unit to deliver the treatment at appropriate timing. This paper is intended to compare algorithms of classification of machine learning to diagnose COVID-19 cases and measure their performance with many metrics, and measure mislabeling (false-positive and false-negative) to specify the best algorithms for speed and accuracy diagnosis. In this paper, we focus onto classify the cases of COVID-19 using the algorithms of machine learning. we load the dataset and perform dataset preparation, pre-processing, analysis of data, selection of features, split of data, and use of classification algorithm. In the first using four classification algorithms, (Stochastic Gradient Descent, Logistic Regression, Random Forest, Naive Bayes), the outcome of algorithms accuracy respectively was 99.61%, 94.82% ,98.37%,96.57%, and the result of execution time for algorithms respectively were 0.01s, 0.7s, 0.20s, 0.04. The Stochastic Gradient Descent of mislabeling was better. Second, using four classification algorithms, (eXtreme-Gradient Boosting, Decision Tree, Support Vector Machines, K_Nearest Neighbors), the outcome of algorithms accuracy was 98.37%, 99%, 97%, 88.4%, and the result of execution time for algorithms respectively were 0.18s, 0.02s, 0.3s, 0.01s. The Decision Tree of mislabeling was better. Using machine learning helps improve allocate medical resources to maximize their utilization. Classification algorithm of clinical data for confirmed COVID-19 cases can help predict a patient's need to advance to the ICU or not need by using a global dataset of COVID-19 cases due to its accuracy and quality.
Object detection has become faster and more precise due to improved computer vision systems. Many successful object detections have dramatically improved owing to the introduction of machine learning methods. This study incorporated cutting- edge methods for object detection to obtain high-quality results in a competitive timeframe comparable to human perception. Object-detecting systems often face poor performance issues. Therefore, this study proposed a comprehensive method to resolve the problem faced by the object detection method using six distinct machine learning approaches: stochastic gradient descent, logistic regression, random forest, decision trees, k-nearest neighbor, and naive Bayes. The system was trained using Common Objects in Context (COCO), the most challenging publicly available dataset. Notably, a yearly object detection challenge is held using COCO. The resulting technology is quick and precise, making it ideal for applications requiring an object detection accuracy of 97%.
Health Information Technology (HIT) provides many opportunities for transforming and improving health care systems. HIT enhances the quality of health care delivery, reduces medical errors, increases patient safety, facilitates care coordination, monitors the updated data over time, improves clinical outcomes, and strengthens the interaction between patients and health care providers. Living in modern large cities has a significant negative impact on people's health, for instance, the increased risk of chronic diseases such as diabetes. According to the rising morbidity in the last decade, the number of patients with diabetes worldwide will exceed 642 million in 2040, meaning that one in every ten adults will be affected. All the previous research on diabetes mellitus indicates that early diagnoses can reduce death rates and overcome many problems. In this regard, machine learning (ML) techniques show promising results in using medical data to predict diabetes at an early stage to save people's lives. In this paper, we propose an intelligent health care system based on ML methods as a real-time monitoring system to detect diabetes mellitus and examine other health issues such as food and drug allergies of patients. The proposed system uses five machine learning methods: K-Nearest Neighbors, Naïve Bayes, Logistic Regression, Random Forest, and Support Vector Machine (SVM). The system selects the best classification method with high accuracy to optimize the diagnosis of patients with diabetes. The experimental results show that in the proposed system, the SVM classifier has the highest accuracy of 83%.
Kinship (Familial relationships) detection is crucial in many fields and has applications in biometric security, adoption, forensic investigations, and more. It is also essential during wars and natural disasters like earthquakes since it may aid in reunion, missing person searches, establishing emergency contacts, and providing psychological support. The most common method of determining kinship is DNA analysis which is highly accurate. Another approach, which is noninvasive, uses facial photos with computer vision and machine learning algorithms for kinship estimation. Each part of the Human -body has its own embedded information that can be extracted and adopted for identification, verification, or classification of that person. Kinship recognition is based on finding traits that are shared by every family. We investigate the use of hand geometry for kinship detection, which is a new approach. Because of the available hand image Datasets do not contain kinship ground truth; therefore, we created our own dataset. This paper describes the tools, methodology, and details of the collected MKH, which stands for the Mosul Kinship Hand, images dataset. The images of MKH dataset were collected using a mobile phone camera with a suitable setup and consisted of 648 images for 81 individuals from 14 families (8 hand situations per person). This paper also presents the use of this dataset in kinship prediction using machine learning. Google MdiaPipe was used for hand detection, segmentation, and geometrical key points finding. Handcraft feature extraction was used to extract 43 distinctive geometrical features from each image. A neural network classifier was designed and trained to predict kinship, yielding about 93% prediction accuracy. The results of this novel approach demonstrated that the hand possesses biometric characteristics that may be used to establish kinship, and that the suggested method is a promising way as a kinship indicator.
Early in the 20th century, as a result of technological advancements, the importance of digital marketing significantly increased as the necessity for digital customer experience, promotion, and distribution emerged. Since the year 1988, in the case when the term ”Digital Marketing” first appeared, the business sector has undergone drastic growth, moving from small startups to massive corporations on a global scale. The marketer must navigate a chaotic environment caused by the vast volume of generated data. Decision-makers must contend with the fact that user data is dynamic and changes every day. Smart applications must be used within enterprises to better evaluate, classify, enhance, and target audiences. Customers who are tech-savvy are pushing businesses to make bigger financial investments and use cutting-edge technologies. It was only natural that marketing and trade could be one of the areas to move to such development, which helps to move to the speed of spread, advertisements, along with other things to facilitate things for reaching and winning customers. In this study, we utilized machine learning (ML) algorithms (Decision tree (DT), K-Nearest Neighbor (KNN), CatBoost, and Random Forest (RF) (for classifying data in customers to move to development. Improve the ability to forecast customer behavior so one can gain more business from them more quickly and easily. With the use of the aforementioned dataset, the suggested system was put to the test. The results show that the system can accurately predict if a customer will buy something or not; the random forest (RF) had an accuracy of 0.97, DT had an accuracy of 0. 95, KNN had an accuracy of 0. 91, while the CatBoost algorithm had the execution time 15.04 of seconds, and gave the best result of highest f1 score and accuracy (0.91, 0. 98) respectively. Finally, the study’s future goals involve being created a web page, thereby helping many banking institutions with speed and forecast accuracy. Using more techniques of feature selection in conjunction with the marketing dataset to improve diagnosis.
The advancements in modern day computing and architectures focus on harnessing parallelism and achieve high performance computing resulting in generation of massive amounts of data. The information produced needs to be represented and analyzed to address various challenges in technology and business domains. Radical expansion and integration of digital devices, networking, data storage and computation systems are generating more data than ever. Data sets are massive and complex, hence traditional learning methods fail to rescue the researchers and have in turn resulted in adoption of machine learning techniques to provide possible solutions to mine the information hidden in unseen data. Interestingly, deep learning finds its place in big data applications. One of major advantages of deep learning is that it is not human engineered. In this paper, we look at various machine learning algorithms that have already been applied to big data related problems and have shown promising results. We also look at deep learning as a rescue and solution to big data issues that are not efficiently addressed using traditional methods. Deep learning is finding its place in most applications where we come across critical and dominating 5Vs of big data and is expected to perform better.
Data-intensive science is a critical science paradigm that interferes with all other sciences. Data mining (DM) is a powerful and useful technology with wide potential users focusing on important meaningful patterns and discovers a new knowledge from a collected dataset. Any predictive task in DM uses some attribute to classify an unknown class. Classification algorithms are a class of prominent mathematical techniques in DM. Constructing a model is the core aspect of such algorithms. However, their performance highly depends on the algorithm behavior upon manipulating data. Focusing on binarazaition as an approach for preprocessing, this paper analysis and evaluates different classification algorithms when construct a model based on accuracy in the classification task. The Mixed National Institute of Standards and Technology (MNIST) handwritten digits dataset provided by Yann LeCun has been used in evaluation. The paper focuses on machine learning approaches for handwritten digits detection. Machine learning establishes classification methods, such as K-Nearest Neighbor(KNN), Decision Tree (DT), and Neural Networks (NN). Results showed that the knowledge-based method, i.e. NN algorithm, is more accurate in determining the digits as it reduces the error rate. The implication of this evaluation is providing essential insights for computer scientists and practitioners for choosing the suitable DM technique that fit with their data.
The smart classroom is a fully automated classroom where repetitive tasks, including attendance registration, are automatically performed. Due to recent advances in artificial intelligence, traditional attendance registration methods have become challenging. These methods require significant time and effort to complete the process. Therefore, researchers have sought alternative ways to accomplish attendance registration. These methods include identification cards, radio frequency, or biometric systems. However, all of these methods have faced challenges in safety, accuracy, effort, time, and cost. The development of digital image processing techniques, specifically face recognition technology, has enabled automated attendance registration. Face recognition technology is considered the most suitable for this process due to its ability to recognize multiple faces simultaneously. This study developed an integrated attendance registration system based on the YOLOv7 algorithm, which extracts features and recognizes students’ faces using a specially collected database of 31 students from Mustansiriyah University. A comparative study was conducted by applying the YOLOv7 algorithm, a machine learning algorithm, and a combined machine learning and deep learning algorithm. The proposed method achieved an accuracy of up to 100%. A comparison with previous studies demonstrated that the proposed method is promising and reliable for automating attendance registration.
Low-quality data can be dangerous for the machine learning models, especially in crucial situations. Some large-scale datasets have low-quality data and false labels, also, datasets with images type probably have artifacts and biases from measurement errors. So, automatic algorithms that are able to recognize low-quality data are needed. In this paper, Shapley Value is used, a metric for evaluation of data, to quantify the value of training data to the performance of a classification algorithm in a large ImageNet dataset. We specify the success of data Shapley in recognizing low-quality against precious data for classification. We figure out that model performance is increased when low Shapley values are removed, whilst classification model performance is declined when high Shapley values are removed. Moreover, there were more true labels in high-Shapley value data and more mislabeled samples in low-Shapley value. Results represent that mislabeled or poor-quality images are in low Shapley value and valuable data for classification are in high Shapley value.
Hand gesture recognition is a quickly developing field with many uses in human-computer interaction, sign language recognition, virtual reality, gaming, and robotics. This paper reviews different ways to model hands, such as vision-based, sensor-based, and data glove-based techniques. It emphasizes the importance of accurate hand modeling and feature extraction for capturing and analyzing gestures. Key features like motion, depth, color, shape, and pixel values and their relevance in gesture recognition are discussed. Challenges faced in hand gesture recognition include lighting variations, complex backgrounds, noise, and real-time performance. Machine learning algorithms are used to classify and recognize gestures based on extracted features. The paper emphasizes the need for further research and advancements to improve hand gesture recognition systems’ robustness, accuracy, and usability. This review offers valuable insights into the current state of hand gesture recognition, its applications, and its potential to revolutionize human-computer interaction and enable natural and intuitive interactions between humans and machines. In simpler terms, hand gesture recognition is a way for computers to understand what people are saying with their hands. It has many potential applications, such as allowing people to control computers without touching them or helping people with disabilities communicate. The paper reviews different ways to develop hand gesture recognition systems and discusses the challenges and opportunities in this area.
The ability of the human brain to communicate with its environment has become a reality through the use of a Brain-Computer Interface (BCI)-based mechanism. Electroencephalography (EEG) has gained popularity as a non-invasive way of brain connection. Traditionally, the devices were used in clinical settings to detect various brain diseases. However, as technology advances, companies such as Emotiv and NeuroSky are developing low-cost, easily portable EEG-based consumer-grade devices that can be used in various application domains such as gaming, education. This article discusses the parts in which the EEG has been applied and how it has proven beneficial for those with severe motor disorders, rehabilitation, and as a form of communicating with the outside world. This article examines the use of the SVM, k-NN, and decision tree algorithms to classify EEG signals. To minimize the complexity of the data, maximum overlap discrete wavelet transform (MODWT) is used to extract EEG features. The mean inside each window sample is calculated using the Sliding Window Technique. The vector machine (SVM), k-Nearest Neighbor, and optimize decision tree load the feature vectors.
Due to the changing flow conditions during the pipeline's operation, several locations of erosion, damage, and failure occur. Leak prevention and early leak detection techniques are the best pipeline risk mitigation measures. To reduce detection time, pipeline models that can simulate these breaches are essential. In this study, numerical modeling using COMSOL Multiphysics is suggested for different fluid types, velocities, pressure distributions, and temperature distributions. The system consists of 12 meters of 8-inch pipe. A movable ball with a diameter of 5 inches is placed within. The findings show that dead zones happen more often in oil than in gas. Pipe insulation is facilitated by the gas phase's thermal inefficiency (thermal conductivity). The fluid mixing is improved by 2.5 m/s when the temperature is the lowest. More than water and gas, oil viscosity and dead zones lower maximum pressure. Pressure decreases with maximum velocity and vice versa. The acquired oil data set is utilized to calibrate the Support Vector Machine and Decision Tree techniques using MATLAB R2021a, ensuring the precision of the measurement. The classification result reveals that the Support Vector Machine (SVM) and Decision Tree (DT) models have the best average accuracy, which is 98.8%, and 99.87 %, respectively.
SARS-COV-2 (severe acute respiratory syndrome coronavirus-2) has caused widespread mortality. Infected individuals had specific radiographic visual features and fever, dry cough, lethargy, dyspnea, and other symptoms. According to the study, the chest X-ray (CXR) is one of the essential non-invasive clinical adjuncts for detecting such visual reactions associated with SARS-COV-2. Manual diagnosis is hindered by a lack of radiologists' availability to interpret CXR images and by the faint appearance of illness radiographic responses. The paper describes an automatic COVID detection based on the deep learning- based system that applied transfer learning techniques to extract features from CXR images to distinguish. The system has three main components. The first part is extracting CXR features with MobileNetV2. The second part used the extracted features and applied Dimensionality reduction using LDA. The final part is a Classifier, which employed XGBoost to classify dataset images into Normal, Pneumonia, and Covid-19. The proposed system achieved both immediate and high results with an overall accuracy of 0.96%, precision of 0.95%, recall of 0.94%, and F1 score of 0.94%.
In today’s world, the data generated by many applications are increasing drastically, and finding an optimal subset of features from the data has become a crucial task. The main objective of this review is to analyze and comprehend different stochastic local search algorithms to find an optimal feature subset. Simulated annealing, tabu search, genetic programming, genetic algorithm, particle swarm optimization, artificial bee colony, grey wolf optimization, and bat algorithm, which have been used in feature selection, are discussed. This review also highlights the filter and wrapper approaches for feature selection. Furthermore, this review highlights the main components of stochastic local search algorithms, categorizes these algorithms in accordance with the type, and discusses the promising research directions for such algorithms in future research of feature selection.
The reliance on networks and systems has grown rapidly in contemporary times, leading to increased vulnerability to cyber assaults. The Distributed Denial-of-Service (Distributed Denial of Service) attack, a threat that can cause great financial liabilities and reputation damage. To address this problem, Machine Learning (ML) algorithms have gained huge attention, enabling the detection and prevention of DDOS (Distributed Denial of Service) Attacks. In this study, we proposed a novel security mechanism to avoid Distributed Denial of Service attacks. Using an ensemble learning methodology aims to it also can differentiate between normal network traffic and the malicious flood of Distributed Denial of Service attack traffic. The study also evaluates the performance of two well-known ML algorithms, namely, the decision tree and random forest, which were used to execute the proposed method. Tree in defending against Distributed Denial of Service (DDoS) attacks. We test the models using a publicly available dataset called TIME SERIES DATASET FOR DISTRIBUTED DENIAL OF SERVICE ATTACK DETECTION. We compare the performance of models using a list of evaluation metrics developing the Model. This step involves fetching the data, preprocessing it, and splitting it into training and testing subgroups, model selection, and validation. When applied to a database of nearly 11,000 time series; in some cases, the proposed approach manifested promising results and reached an Accuracy (ACC) of up to 100 % in the dataset. Ultimately, this proposed method detects and mitigates distributed denial of service. The solution to securing communication systems from this increasing cyber threat is this: preventing attacks from being successful.
Detecting pulmonary cancers at early stages is difficult but crucial for patient survival. Therefore, it is essential to develop an intelligent, autonomous, and accurate lung cancer detection system that shows great reliability compared to previous systems and research. In this study, we have developed an innovative lung cancer detection system known as the Hybrid Lung Cancer Stage Classifier and Diagnosis Model (Hybrid-LCSCDM). This system simplifies the complex task of diagnosing lung cancer by categorizing patients into three classes: normal, benign, and malignant, by analyzing computed tomography (CT) scans using a two-part approach: First, feature extraction is conducted using a pre-trained model called VGG-16 for detecting key features in lung CT scans indicative of cancer. Second, these features are then classified using a machine learning technique called XGBoost, which sorts the scans into three categories. A dataset, IQ-OTH/NCCD - Lung Cancer, is used to train and evaluate the proposed model to show its effectiveness. The dataset consists of the three aforementioned classes containing 1190 images. Our suggested strategy achieved an overall accuracy of 98.54%, while the classification precision among the three classes was 98.63%. Considering the accuracy, recall, and precision as well as the F1-score evaluation metrics, the results indicated that when using solely computed tomography scans, the proposed (Hybrid-LCSCDM) model outperforms all previously published models.
Breast cancer is one of the most critical diseases suffered by many people around the world, making it the most common medical risk they will face. This disease is considered the leading cause of death around the world, and early detection is difficult. In the field of healthcare, where early diagnosis based on machine learning (ML) helps save patients’ lives from the risks of diseases, better-performing diagnostic procedures are crucial. ML models have been used to improve the effectiveness of early diagnosis. In this paper, we proposed a new feature selection method that combines two filter methods, Pearson correlation and mutual information (PC-MI), to analyse the correlation amongst features and then select important features before passing them to a classification model. Our method is capable of early breast cancer prediction and depends on a soft voting classifier that combines a certain set of ML models (decision tree, logistic regression and support vector machine) to produce one model that carries the strengths of the models that have been combined, yielding the best prediction accuracy. Our work is evaluated by using the Wisconsin Diagnostic Breast Cancer datasets. The proposed methodology outperforms previous work, achieving 99.3% accuracy, an F1 score of 0.9922, a recall of 0.9846, a precision of 1 and an AUC of 0.9923. Furthermore, the accuracy of 10-fold cross-validation is 98.2%.
With the recent developments of technology and the advances in artificial intelligence and machine learning techniques, it has become possible for the robot to understand and respond to voice as part of Human-Robot Interaction (HRI). The voice-based interface robot can recognize the speech information from humans so that it will be able to interact more naturally with its human counterpart in different environments. In this work, a review of the voice-based interface for HRI systems has been presented. The review focuses on voice-based perception in HRI systems from three facets, which are: feature extraction, dimensionality reduction, and semantic understanding. For feature extraction, numerous types of features have been reviewed in various domains, such as time, frequency, cepstral (i.e. implementing the inverse Fourier transform for the signal spectrum logarithm), and deep domains. For dimensionality reduction, subspace learning can be used to eliminate the redundancies of high-dimensional features by further processing extracted features to reflect their semantic information better. For semantic understanding, the aim is to infer from the extracted features the objects or human behaviors. Numerous types of semantic understanding have been reviewed, such as speech recognition, speaker recognition, speaker gender detection, speaker gender and age estimation, and speaker localization. Finally, some of the existing voice-based interface issues and recommendations for future works have been outlined.