مرکز منطقه ای اطلاع رساني علوم و فناوري - Towards improving the performance of text/language independent speaker recognition systems

DocumentCode :

1794695

Title :

Towards improving the performance of text/language independent speaker recognition systems

Author :

George, Kuruvachan K. ; Arunraj, K. ; Sreekumar, K.T. ; Kumar, C. Senthil ; Ramachandran, K.I.

Author_Institution :

Machine Intell. Res. Lab., Amrita Vishwa Vidyapeetham, Coimbatore, India

fYear :

2014

fDate :

6-11 Jan. 2014

Firstpage :

Lastpage :

Abstract :

Speaker Recognition is an active area of research for the last few decades for its applications in several national security, and other forensic applications. In this work, we present the details of a speaker recognition system developed using universal background model and support vector machines(UBM-SVM). We explored several techniques to improve the performance of the baseline system developed using mel frequency cepstral coefficients(MFCC) as input features. We developed and tested the speaker recognition system for 200 speakers, using the data collected over 13 different channels, such as handset regular phone, speaker phone, regular phone headphone, regular phone, etc. We experimented with the use of RelAtive SpecTrA (RASTA) processing, and feature warping on the input MFCC features, and nuisance attribute projection (NAP) on the Gaussian mixture model supervectors derived in the system. It was seen that these techniques have helped improve the system performance significantly by minimizing the effect of different channels on the system performance. The details of the system implementation and results are presented in this paper. The complete system is developed in MATLAB and C/C++.

Keywords :

Gaussian processes; speaker recognition; support vector machines; text analysis; Gaussian mixture model supervectors; MFCC; NAP; RASTA processing; UBM-SVM; baseline system; feature warping; forensic applications; handset regular phone; mel frequency cepstral coefficients; national security; nuisance attribute projection; regular phone; regular phone headphone; relative spectra; speaker phone; text-language independent speaker recognition systems; universal background model and support vector machines; Adaptation models; Feature extraction; Mel frequency cepstral coefficient; Speaker recognition; Speech; Support vector machines; Training data; Feature Warping; Gaussian mixture model; Maximum a Posteriori(MAP) Adaptation; RASTA filtering; Speaker Recognition System; VAD; nuisance attribute projection; support vector machine;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Power Signals Control and Computations (EPSCICON), 2014 International Conference on

Conference_Location :

Thrissur

Print_ISBN :

978-1-4799-3611-3

Type :

conf

DOI :

10.1109/EPSCICON.2014.6887506

Filename :

6887506

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1794695