Page 159 - 2023-Vol19-Issue2
P. 159

155 |                                                             Mohammed, Oraibi & Hussain

           TABLE I. Performance Comparison.                       three state-of-the-art pre-trained deep learning architectures:
                                                                  Xception, Mobilenet, and Inception combined together using
Method                 ACC    PREC   REC     F1-S                 a hard voting ensemble approach. The approach was tested
Srivastava et al. [3]  89.7%  -      -       -                    using a practical and challenging dataset called CBIR 50 and
Scott et al. [23]      -      71.3%  86.3%   -                    showed improved ACC, PREC, REC, and F1-S compared
CNN 7 [25]             -      68.8%  83%     -                    to other methods. In addition, the experiments in this paper
Adaboost CNN [24]      -      71.3%  86.3%   -                    showed that the performance of combing these three architec-
”Bagging CNN [24]      -      72.4%  92.4%   -                    tures using ensemble learning exceeded the performance of
DTLDN-CBIRA [27]       -      -      81.9%   89.9%                each architecture applied on the same number of classes of
                                                                  CBIR 50 dataset. The results of the experiments showed that
Xception               93.1%  93.2%  93.1%   93.0%                all four methods achieved high performance, with Ensemble
MobileNet              81.8%  85.3%  81.8%   81.8%                Learning outperforming the others with a 95% ACC, PREC,
Inception              91.7%  92.0%  91.7%   91.8%                REC, and F1-S. These results demonstrate the effectiveness
EL                     94.7%  95.0%  94.6%   94.6%                of the proposed approach in CBIR and the robustness of Xcep-
                                                                  tion, Inception, MobileNet, and Ensemble Learning in image
residual connections. This structure aids in reducing memory      classification tasks. This research highlights the potential for
requirements and computational costs. By dividing the sepa-       these methods in various applications and further research in
rable convolution in Xception, space-wise and channel-wise        this field.
features are learned, resolving representational bottlenecks
and vanishing gradients.                                              For the future work, the proposed approach could be im-
                                                                  plemented on other datasets and domains to test its robustness
    The Inception model, another CNN, is specifically used        and generalizability. Additionally, further research could in-
for extracting characteristics from query images as well as       volve combining the proposed approach with other image
database images. It optimizes the network by factoring con-       retrieval techniques, such as text-based or hybrid methods, to
volutions into distinct branches that operate on space and        improve overall performance. Another potential avenue of ex-
channels in succession. This allows the model to learn mul-       ploration would be to apply the ensemble method to other DL
tiscale representations while reducing the overall number of      models to enhance their performance. Furthermore, analyz-
restrictions and computational complexity.                        ing the performance of the proposed approach on large-scale
                                                                  datasets and improving its computational efficiency can also
    MobileNet is another model we use, which provides a bal-      be a future work.
ance between computational efficiency and model accuracy. It
is particularly useful for applications that require lightweight                CONFLICT OF INTEREST
models for deployment on devices with limited computational
resources. Lastly, we employ Ensemble Learning to combine         The authors have no conflict of relevant interest to this article.
the strengths of the individual models and improve the overall
performance. Ensemble Learning helps to increase the robust-                           REFERENCES
ness and stability of our CBIR system, leading to improved
accuracy.                                                          [1] X. Li, J. Yang, and J. Ma, “Recent developments of
                                                                        content-based image retrieval (cbir),” Neurocomputing,
    Each of these models has demonstrated high accuracy in              vol. 452, pp. 675–689, 2021.
our tests, with some room for improvement in certain classes.
For example, the Xception model achieved an overall accu-          [2] L. Tang, J. Yuan, and J. Ma, “Image fusion in the loop
racy of 0.93, while the Inception model had an overall pre-             of high-level vision tasks: A semantic-aware real-time
cision and recall of 0.92. MobileNet achieved an accuracy               infrared and visible image fusion network,” Information
of 0.82, and the Ensemble Learning model achieved an im-                Fusion, vol. 82, pp. 28–42, 2022.
pressive overall accuracy of 95%. These results validate our
choice of DL models, demonstrating their effectiveness in the      [3] P. Srivastava and A. Khare, “Integration of wavelet trans-
CBIR task. However, we acknowledge that there is scope for              form, local binary patterns and moments for content-
improvement in some classes, and further analysis may be                based image retrieval,” Journal of Visual Communica-
needed to understand why this is the case and how to improve            tion and Image Representation, vol. 42, pp. 78–103,
the model’s performance for these classes.                              2017.

   VI. CONCLUSIONS AND FUTURE WORK                                 [4] X. Zhang, H. Zhai, J. Liu, Z. Wang, and H. Sun, “Real-
                                                                        time infrared and visible image fusion network using
In conclusion, this study presents a new approach for feature
extraction in Content-Based Image Retrieval (CBIR) using
   154   155   156   157   158   159   160   161   162   163   164