DocumentCode :
137234
Title :
A Bayesian approach to speaker normalization using vowel formant frequency
Author :
Ram, Dhananjay ; Kundu, Debasis ; Hegde, Rajesh M.
Author_Institution :
Indian Inst. of Technol., Kanpur, Kanpur, India
fYear :
2014
fDate :
Feb. 28 2014-March 2 2014
Firstpage :
1
Lastpage :
6
Abstract :
Large variation in speakers causes significant performance degradation of a speaker independent speech recognition system. In an attempt to compensate for this degradation in performance, this paper proposes a novel Bayesian approach to estimate speaker normalization parameters. An affine model is used here, which captures the variation in length of the vocal tract more effectively than the linear model used in literature. The vocal tract length normalization (VTLN) parameters are estimated using Least Squares Estimation (LSE) as well as a Bayesian approach which utilizes the Gibbs sampler, a special type of Markov Chain Monte Carlo method. Finally, a Mahalanobis distance based vowel recognizer is proposed and experiments are performed for both gender dependent and independent cases. Results clearly indicate a performance improvement for the Bayesian case over LSE.
Keywords :
Markov processes; Monte Carlo methods; belief networks; speaker recognition; LSE; Mahalanobis distance based vowel recognizer; Markov Chain Monte Carlo method; VTLN parameter; affine model; least square estimation; novel Bayesian approach; speaker independent speech recognition system; speaker normalization parameter; vocal tract length normalization parameter; vowel formant frequency; Accuracy; Bayes methods; Databases; Equations; Least squares approximations; Mathematical model; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communications (NCC), 2014 Twentieth National Conference on
Conference_Location :
Kanpur
Type :
conf
DOI :
10.1109/NCC.2014.6811382
Filename :
6811382
Link To Document :
بازگشت