Speaker Verification Based on Mel Frequency Cepestral Coefficients and Correlation

Abdalem A. Rasheed

doi:10.37917/ijeee.22.1.8

Abstract

Speaker recognition refers to identifying the speaker by his or her voice. People talk in a variety of tones and each speaking voice has features that distinguish one person from another. Speaker verification (SV)involves comparing a set of measures of the speaker’s utterances with a reference for the person whose identification is being asserted to accept or reject the speaker’s identity claim. An identity claim is made during speaker verification which consists of two steps: extraction of feature and matching of feature. In this work, the analysis of correlations of Mel-scale coefficients for the voice of utterance to identify the intended speaker is presented. Short text-dependent word and other text-independent word is represented in this study. The correlation accuracy ranged from 98% to 99% for user1 (same speaker) for text-dependent. whereas 83% and 61% for user1 correlation with other speakers for text-dependent and independent respectively. Furthermore, the MFCC feature extraction approach based on distributed Discrete Cosine Transform (DCT) is provided in this research. SV tests are carried out using the MFCC feature extractions method where close variance for the target speaker and away variance for other speakers is obtained. Additionally, the principle component analysis (PCA) is provided to improve the discriminative system performance. Where the PCA chooses the optimal path between every pair of extremely confusing speakers. The results obtained from PCA were similar to the correlation finding from the Mel-scale results with enhancing the discriminative information and with lowering the dimension of MFCCs data..

References

S. Hizlisoy and R. S. Arslan, “Text independent speaker recognition based on mfcc and machine learning,” Selcuk University Journal of Engineering Sciences, vol. 20, no. 3, pp. 73–78, 2021.
U. Ayvaz, H. G¨ur¨uler, F. Khan, N. Ahmed, T. Whangbo, and A. A. Bobomirzaevich, “Automatic speaker recognition using mel-frequency cepstral coefficients through machine learning.,” Computers, Materials & Continua, vol. 71, no. 3, 2022.
F. Ye and J. Yang, “A deep neural network model for speaker identification,” Applied Sciences, vol. 11, no. 8, p. 3603, 2021.
X. Liu, M. Sahidullah, and T. Kinnunen, “Learnable mfccs for speaker verification,” in 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5, IEEE, 2021.
J. Thienpondt, B. Desplanques, and K. Demuynck, “The idlab voxceleb speaker recognition challenge 2020 system description,” arXiv preprint arXiv:2010.12468, 2020.
N. D. Minh, “Dsp mini-project: An automatic speaker recognition system,” 2012.
B. Kari and S. Muthulakshmi, “Real time implementation of speaker recognition system with mfcc and neural networks on fpga,” Indian Journal of Science and Technology, vol. 8, no. 19, p. 1, 2015.
J.-C. Liu, F.-Y. Leu, G.-L. Lin, and H. Susanto, “An mfcc-based text-independent speaker identification system for access control,” Concurrency and Computation: Practice and Experience, vol. 30, no. 2, p. e4255, 2018.
S. S. Tirumala, S. R. Shahamiri, A. S. Garhwal, and R. Wang, “Speaker identification features extraction methods: A systematic review,” Expert Systems with Applications, vol. 90, pp. 250–271, 2017.
A. Poddar, M. Sahidullah, and G. Saha, “Speaker verification with short utterances: a review of challenges, trends and opportunities,” IET Biometrics, vol. 7, no. 2, pp. 91–101, 2018.
M. A. Pathak and B. Raj, “Privacy-preserving speaker verification and identification using gaussian mixture models,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 2, pp. 397–406, 2012.
F. K. Faek and A. K. Al-Talabani, “Speaker recognition from noisy spoken sentences,” International Journal of Computer Applications, vol. 70, no. 20, 2013.
S. Furui, “Recent advances in speaker recognition,” Pattern recognition letters, vol. 18, no. 9, pp. 859–872, 1997.
T. Kinnunen, E. Karpov, and P. Franti, “Real-time speaker identification and verification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 1, pp. 277–288, 2005.
M. Jin and C. D. Yoo, “Speaker verification and identification,” in Behavioral Biometrics for Human Identification: Intelligent Applications, pp. 264–289, IGI Global, 2010.
A. Buono, W. Jatmiko, and B. Kusumoputro, “Melfrequency cepstrum coeffficients as higher order statistics representation to characterize speech signal for speaker identification system in noisy environment using hidden markov model,” in Self Organizing Maps- Applications and Novel Algorithm Design, IntechOpen, 2011.
A. Winursito, R. Hidayat, and A. Bejo, “Improvement of mfcc feature extraction accuracy using pca in indonesian speech recognition,” in 2018 International Conference on Information and Communications Technology (ICOIACT), pp. 379–383, IEEE, 2018.
A. Sahoo and A. Panda, “Study of speaker recognition systems,” National Institute of Technology, Rourkela, 2011.
R. Gupta and G. Sivakumar, “Speech recognition for hindi language,” IIT BOMBAY, 2006.
L. Rabiner and B.-H. Juang, Fundamentals of speech recognition. Prentice-Hall, Inc., 1993.
K. Daqrouq and T. A. Tutunji, “Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers,” Applied Soft Computing, vol. 27, pp. 231–239, 2015.
A. Antony and R. Gopikakumari, “Speaker identification based on combination of mfcc and umrt based features,” Procedia computer science, vol. 143, pp. 250–257, 2018.
A. Books, “Template-matching for text-dependent speaker verification, dey, subhadeep, motlicek, petr, madikeri, srikanth and ferras, marc, idiap-rr-32-2017,” Speech Communication, 2017.
M. Athulya and P. Sathidevi, “Speaker verification from codec distorted speech for forensic investigation through serial combination of classifiers,” Digital Investigation, vol. 25, pp. 70–77, 2018.
W. Lin and M.-W. Mak, “Robust speaker verification using population-based data augmentation,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7642–7646, IEEE, 2022.
S. Hidayat, M. Tajuddin, S. A. A. Yusuf, J. Qudsi, and N. N. Jaya, “Wavelet detail coefficient as a novel waveletmfcc features in text-dependent speaker recognition system,” IIUM Engineering Journal, vol. 23, no. 1, pp. 68– 81, 2022.
N. Chauhan, T. Isshiki, and D. Li, “Text-independent speaker recognition system using feature-level fusion for audio databases of various sizes,” SN Computer Science, vol. 4, no. 5, p. 531, 2023.
S. Sreedharan and C. Eswaran, “A review on speaker verification: Challenges and issues,” Int. J. Sci. Technol. Res., vol. 8, no. 8, pp. 956–960, 2019.

Vol. 22 No. 1 (2026)

Speaker Verification Based on Mel Frequency Cepestral Coefficients and Correlation

History

DOI

Abstract

Keywords

References

Iraqi Journal for Electrical and Electronic Engineering

Licensing & Open Access