Page 153 - 2023-Vol19-Issue2
P. 153
149 | Mohammed, Oraibi & Hussain
Fig. 1. CBIR system structure. The CBIR system is separated The number and type of images in our dataset play a
into online and offline subsystems in accordance with two critical role in the performance of our CBIR system. The count
distinct information processing methods, with a common of 3,843 images is substantial and helps in achieving diversity
feature extraction block shared between them. and generalization in our image retrieval model. This broad set
of images enables the model to learn and differentiate between
feature-based picture representation is essential. The CBIR a wide range of categories and their subclasses. However, as
system is built technologically on database search and picture with any machine learning task, the more data, the better. So
representation. It may therefore review CBIR research based while our number is a good start, we may need to supplement
on advancements in respectively, database search and picture our dataset to enhance the model’s ability to understand and
synthesis. distinguish between complex and nuanced differences within
and across categories.
B. Feature Extraction
The crucial stage in CBIR is the image representation, which The dataset exclusively containing JPEG images is of
involves taking the important elements from an image and importance too. JPEG is a common image format, and its
turning them into a fixed-sized vector (so called Figure 2: widespread use is partly why we chose it. However, the JPEG
The feature vector). The conventional features, classification format uses lossy compression, which may result in some
CNN features, and retrieval CNN features are the three broad loss of image detail. This could pose a challenge when the
categories into which the extracted features may generally be retrieval task requires fine-grained identification or discrimina-
separated. In the next section, we provide the techniques for tion. Moreover, different image formats may exhibit distinct
image representation for CBIR based on each of these three characteristics or encode different levels of detail, which could
feature groups. affect the feature extraction process.
IV. METHODOLOGY The collection of metadata such as image resolution, color
depth, and file size helps in the retrieval process by providing
In this section, we will present the dataset used in our work. additional dimensions for searching and matching. However,
In addition, the three DL architectures used in this paper are inconsistency in these metadata parameters (like differing res-
described. Finally, we will discuss the methods we will use olutions) can add to the complexity of the task and potentially
for feature extraction in CBIR as Figure 2 illustrate. affect the system’s performance. In the CBIR task, different
feature extraction methods can be used on our dataset, includ-
A. Dataset ing Xception, MobileNet, Inception, and Ensemble Learning.
Our dataset, designed for CBIR, comprises 3,843 JPEG im- These methods are tasked to convert raw image data into a
ages. These images are categorically arranged into 20 dif- suitable form that can be processed by our model. The choice
ferent classes: TajMahal, Bottle, Shark, Lotus, Eiffel Tower, of method may significantly affect the retrieval performance,
Jeans, Ship, Dalmatian, Obama, Apple, Maggi, Clock, Bud- and hence selecting an appropriate feature extraction strategy
dha, Modi, Helmet, Mobile, Peacock, Soccer Ball, Tabla, is another challenge with our dataset.
Horse. A sample image from each class is presented in Fig-
ure 3. Each class is further divided into subclasses based on In conclusion, our dataset, while being a robust starting
different attributes. For instance, the ”car” class is subdivided point for our CBIR system, does pose certain challenges that
into types of cars, like sedans, SUVs, and sports cars, or even need to be addressed to ensure optimal performance. It’s a
by different car brands. This hierarchical organization helps in reminder that dataset creation and management is as crucial
a more nuanced and detailed retrieval process, thus enhancing a step as model selection and tuning in machine learning
the accuracy of the system. projects.
B. Inception Model
One of the CNN networks, Inception, is used specifically
for extracting characteristics from query pictures as well as
database pictures [30, 31]. It benefits from factoring convolu-
tions into distinct branches that operate on space and channels
in succession. It uses a wide range of strategies to optimize
the network. The core idea behind the Inception framework is
to swap out tiny kernels for bigger ones in order to learn mul-
tiscale representations and to lower the amount of restrictions
and computational complexity [32, 33].