×
The submission system is temporarily under maintenance. Please send your manuscripts to
Go to Editorial ManagerContent-Based Image Retrieval (CBIR) is an automatic process of retrieving images that are the most similar to a query image based on their visual content such as colour and texture features. However, CBIR faces the technical challenge known as the semantic gap between high level conceptual meaning and the low-level image based features. This paper presents a new method that addresses the semantic gap issue by exploiting cluster shapes. The method first extracts local colours and textures using Discrete Cosine Transform (DCT) coefficients. The Expectation-Maximization Gaussian Mixture Model (EM/GMM) clustering algorithm is then applied to the local feature vectors to obtain clusters of various shapes. To compare dissimilarity between two images, the method uses a dissimilarity measure based on the principle of Kullback-Leibler divergence to compare pair-wise dissimilarity of cluster shapes. The paper further investigates two respective scenarios when the number of clusters is fixed and adaptively determined according to cluster quality. Experiments are conducted on publicly available WANG and Caltech6 databases. The results demonstrate that the proposed retrieval mechanism based on cluster shapes increases the image discrimination, and when the number of clusters is fixed to a large number, the precision of image retrieval is better than that when the relatively small number of clusters is adaptively determined.
Speaker recognition refers to identifying the speaker by his or her voice. People talk in a variety of tones and each speaking voice has features that distinguish one person from another. Speaker verification (SV)involves comparing a set of measures of the speaker’s utterances with a reference for the person whose identification is being asserted to accept or reject the speaker’s identity claim. An identity claim is made during speaker verification which consists of two steps: extraction of feature and matching of feature. In this work, the analysis of correlations of Mel-scale coefficients for the voice of utterance to identify the intended speaker is presented. Short text-dependent word and other text-independent word is represented in this study. The correlation accuracy ranged from 98% to 99% for user1 (same speaker) for text-dependent. whereas 83% and 61% for user1 correlation with other speakers for text-dependent and independent respectively. Furthermore, the MFCC feature extraction approach based on distributed Discrete Cosine Transform (DCT) is provided in this research. SV tests are carried out using the MFCC feature extractions method where close variance for the target speaker and away variance for other speakers is obtained. Additionally, the principle component analysis (PCA) is provided to improve the discriminative system performance. Where the PCA chooses the optimal path between every pair of extremely confusing speakers. The results obtained from PCA were similar to the correlation finding from the Mel-scale results with enhancing the discriminative information and with lowering the dimension of MFCCs data..