Content-Based Image Retrieval (CBIR) is an automatic process of retrieving images that are the most similar to a query image based on their visual content such as colour and texture features. However, CBIR faces the technical challenge known as the semantic gap between high level conceptual meaning and the low-level image based features. This paper presents a new method that addresses the semantic gap issue by exploiting cluster shapes. The method first extracts local colours and textures using Discrete Cosine Transform (DCT) coefficients. The Expectation-Maximization Gaussian Mixture Model (EM/GMM) clustering algorithm is then applied to the local feature vectors to obtain clusters of various shapes. To compare dissimilarity between two images, the method uses a dissimilarity measure based on the principle of Kullback-Leibler divergence to compare pair-wise dissimilarity of cluster shapes. The paper further investigates two respective scenarios when the number of clusters is fixed and adaptively determined according to cluster quality. Experiments are conducted on publicly available WANG and Caltech6 databases. The results demonstrate that the proposed retrieval mechanism based on cluster shapes increases the image discrimination, and when the number of clusters is fixed to a large number, the precision of image retrieval is better than that when the relatively small number of clusters is adaptively determined.
Advancements in internet accessibility and the affordability of digital picture sensors have led to the proliferation of extensive image databases utilized across a multitude of applications. Addressing the semantic gap between low- level attributes and human visual perception has become pivotal in refining Content Based Image Retrieval (CBIR) methodologies, especially within this context. As this field is intensely researched, numerous efficient algorithms for CBIR systems have surfaced, precipitating significant progress in the artificial intelligence field. In this study, we propose employing a hard voting ensemble approach on features derived from three robust deep learning architectures: Inception, Exception, and Mobilenet. This is aimed at bridging the divide between low-level image features and human visual perception. The Euclidean method is adopted to determine the similarity metric between the query image and the features database. The outcome was a noticeable improvement in image retrieval accuracy. We applied our approach to a practical dataset named CBIR 50, which encompasses categories such as mobile phones, cars, cameras, and cats. The effectiveness of our method was thereby validated. Our approach outshone existing CBIR algorithms with superior accuracy (ACC), precision (PREC), recall (REC), and F1-score (F1-S), proving to be a noteworthy addition to the field of CBIR. Our proposed methodology could be potentially extended to various other sectors, including medical imaging and surveillance systems, where image retrieval accuracy is of paramount importance.