Page 159 - 2023-Vol19-Issue2

P. 159

155 | Mohammed, Oraibi & Hussain

TABLE I. Performance Comparison. three state-of-the-art pre-trained deep learning architectures:
Xception, Mobilenet, and Inception combined together using
Method ACC PREC REC F1-S a hard voting ensemble approach. The approach was tested
Srivastava et al. [3] 89.7% - - - using a practical and challenging dataset called CBIR 50 and
Scott et al. [23] - 71.3% 86.3% - showed improved ACC, PREC, REC, and F1-S compared
CNN 7 [25] - 68.8% 83% - to other methods. In addition, the experiments in this paper
Adaboost CNN [24] - 71.3% 86.3% - showed that the performance of combing these three architec-
”Bagging CNN [24] - 72.4% 92.4% - tures using ensemble learning exceeded the performance of
DTLDN-CBIRA [27] - - 81.9% 89.9% each architecture applied on the same number of classes of
CBIR 50 dataset. The results of the experiments showed that
Xception 93.1% 93.2% 93.1% 93.0% all four methods achieved high performance, with Ensemble
MobileNet 81.8% 85.3% 81.8% 81.8% Learning outperforming the others with a 95% ACC, PREC,
Inception 91.7% 92.0% 91.7% 91.8% REC, and F1-S. These results demonstrate the effectiveness
EL 94.7% 95.0% 94.6% 94.6% of the proposed approach in CBIR and the robustness of Xcep-
tion, Inception, MobileNet, and Ensemble Learning in image
residual connections. This structure aids in reducing memory classification tasks. This research highlights the potential for
requirements and computational costs. By dividing the sepa- these methods in various applications and further research in
rable convolution in Xception, space-wise and channel-wise this field.
features are learned, resolving representational bottlenecks
and vanishing gradients. For the future work, the proposed approach could be im-
plemented on other datasets and domains to test its robustness
The Inception model, another CNN, is specifically used and generalizability. Additionally, further research could in-
for extracting characteristics from query images as well as volve combining the proposed approach with other image
database images. It optimizes the network by factoring con- retrieval techniques, such as text-based or hybrid methods, to
volutions into distinct branches that operate on space and improve overall performance. Another potential avenue of ex-
channels in succession. This allows the model to learn mul- ploration would be to apply the ensemble method to other DL
tiscale representations while reducing the overall number of models to enhance their performance. Furthermore, analyz-
restrictions and computational complexity. ing the performance of the proposed approach on large-scale
datasets and improving its computational efficiency can also
MobileNet is another model we use, which provides a bal- be a future work.
ance between computational efficiency and model accuracy. It
is particularly useful for applications that require lightweight CONFLICT OF INTEREST
models for deployment on devices with limited computational
resources. Lastly, we employ Ensemble Learning to combine The authors have no conflict of relevant interest to this article.
the strengths of the individual models and improve the overall
performance. Ensemble Learning helps to increase the robust- REFERENCES
ness and stability of our CBIR system, leading to improved
accuracy. [1] X. Li, J. Yang, and J. Ma, “Recent developments of
content-based image retrieval (cbir),” Neurocomputing,
Each of these models has demonstrated high accuracy in vol. 452, pp. 675–689, 2021.
our tests, with some room for improvement in certain classes.
For example, the Xception model achieved an overall accu- [2] L. Tang, J. Yuan, and J. Ma, “Image fusion in the loop
racy of 0.93, while the Inception model had an overall pre- of high-level vision tasks: A semantic-aware real-time
cision and recall of 0.92. MobileNet achieved an accuracy infrared and visible image fusion network,” Information
of 0.82, and the Ensemble Learning model achieved an im- Fusion, vol. 82, pp. 28–42, 2022.
pressive overall accuracy of 95%. These results validate our
choice of DL models, demonstrating their effectiveness in the [3] P. Srivastava and A. Khare, “Integration of wavelet trans-
CBIR task. However, we acknowledge that there is scope for form, local binary patterns and moments for content-
improvement in some classes, and further analysis may be based image retrieval,” Journal of Visual Communica-
needed to understand why this is the case and how to improve tion and Image Representation, vol. 42, pp. 78–103,
the model’s performance for these classes. 2017.

VI. CONCLUSIONS AND FUTURE WORK [4] X. Zhang, H. Zhai, J. Liu, Z. Wang, and H. Sun, “Real-
time infrared and visible image fusion network using
In conclusion, this study presents a new approach for feature
extraction in Content-Based Image Retrieval (CBIR) using

154 155 156 157 158 159 160 161 162 163 164