Page 106 - IJEEE-2023-Vol19-ISSUE-1
P. 106

102 |                                                                                                          Abed, Wali, & Alaziz

ball detecting leaks within the pipeline using acoustic                 y: is response values
signals. With this information, trained guesses can be made
about where the leak is and how fast it is spreading. The               n: is the number of data set.
velocity, pressure, and temperature profiles will be used to
calibrate the internal control system using the Support                 The root-mean-square error (RMSE) is a popularly
Vector Machine (SVM) and Decision Tree (DT) algorithms           used measurement of the gaps that exist between the values
for classification leaks using the ball. The control system      (both sample and population values) that are forecasted by a
factors in the disruptions brought about by the fluid flow       model or estimator and the values that are observed. The
around the ball, and it makes a relationship between the         resultant optimization model is used for measurement
levels of sound pressure and the detection of leaks.             modification. When explaining or summarizing the expected
                                                                 results of a classification problem, confusion matrices are a
  Machine learning is a branch of artificial intelligence (AI)   beneficial tool. Linear regression algorithm develops
and computer science that focuses on using data and              Support Vector machine prevails Confusion matrix. A
algorithms to imitate how humans learn, to constantly            Confusion matrix's most crucial function is to provide a
improve the simulation's accuracy. Several kinds of machine      class-by-class breakdown of the total number of correct and
learning algorithms are often employed. These are as             incorrect guesses that have been generated. The performance
follows: Support Vector Machine (SVM), Decision trees            of the machine learning method can be investigated using the
(DT), and K- Nearest-Neighbor (KNN). This paper uses             parameters of precisions, accuracy, recall, and F1-score. The
SVM and DT for comparison. The obtained data of velocity,        confusion matrix determines the following [28], [29]:
pressure, and temperature distribution parameters, where the
leakage sound energy is applied, are employed within the        - True Positive (TP): Both the value that was seen and the
linear regression algorithm using SVM[27]. The detailed              value that was predicted are positive.
steps for developing the current SVM system are illustrated
in Fig. 1,                                                      - False Negative (FN): When the actual observed value is
                                                                     interpreted as having a negative sign, even when it has a
                   Fig. 1: SVM detailed steps.                       positive one.4

   MATLAB R2021a is used throughout this study to carry         - The condition that is referred to as a “True Negative”
out Support Vector Machine calculations (SVM) where the              (TN) is one in which the observations are consistent with
optimum correction curve is utilized statistically using the         the expectations of the null hypothesis.
following Equation:
                                                                     When it comes to classifiers, the Receiver Operating
!"#$ = 	 '?"($%"&'($)!(*.,	  (1)                                     Characteristic (ROC) graph is a useful tool for
                                                                     determining which element is the most essential. The rate
      Where RMSE is a root mean square deviation of                  of true positives is represented along the ROC curve’s Y
resultant error.                                                     axis, while the rate of false positives is shown along the
                                                                     ROC curve’s X axis. The “ideal” location, which may be
                                                                     found at the top left corner of the map, has a failure
                                                                     probability of zero and a success probability of one. It is
                                                                     evident that this is not the case; nevertheless, it does show
                                                                     that a larger Area under the curve (AUC) is desirable in
                                                                     the majority of circumstances [30], [31]. The “steepness”
                                                                     of ROC curves is one factor that may affect the ideal
                                                                     strategy, which is to increase the TP rate while decreasing
                                                                     the FP rate. The analytical result was negative, but the
                                                                     projection is that it will be positive. ROC curves are
                                                                     frequently used for binary classification to evaluate the
                                                                     output of a classification, and this is exactly what is being
                                                                     done here because the classification technique includes
                                                                     whether or not a leakage is identified [31].

                                                                 The SVM and DT were trained using our dataset generated
                                                                 from simulation, with extra leakage points added for
                                                                 optimum performance during training and testing. The data
                                                                 is divided using cross-validation (K-fold) with K=10, with
                                                                 70% of the data randomly selected for training and 30% for
                                                                 testing, with the accuracy evaluated at each iteration. The
                                                                 data consist of three parameters (velocity, pressure, and
                                                                 temperature). Each parameter has velocity values (0.1 m/s, 1
                                                                 m/s, and 2.5 m/s). The confusion matrices of SVM after
                                                                 training and testing the data is shown in Fig. 2, and the
                                                                 confusion matrices of DT for training and testing data is
                                                                 shown in Fig. 3,
   101   102   103   104   105   106   107   108   109   110   111