DocumentCode
672345
Title
Improved cepstral mean and variance normalization using Bayesian framework
Author
Prasad, N. Vishnu ; Umesh, S.
Author_Institution
Dept. of Electr. Eng., Indian Inst. of Technol., Madras, Chennai, India
fYear
2013
fDate
8-12 Dec. 2013
Firstpage
156
Lastpage
161
Abstract
Cepstral Mean and Variance Normalization (CMVN) is a computationally efficient normalization technique for noise robust speech recognition. The performance of CMVN is known to degrade for short utterances, due to insufficient data for parameter estimation and loss of discriminable information as all utterances are forced to have zero mean and unit variance. In this work, we propose to use posterior estimates of mean and variance in CMVN, instead of the maximum likelihood estimates. This Bayesian approach, in addition to providing a robust estimate of parameters, is also shown to preserve discriminable information without increase in computational cost, making it particularly relevant for Interactive Voice Response (IVR)-based applications. The relative WER reduction of this approach w.r.t. Cepstral Mean Normalization, CMVN and Histogram Equalization are (i) 40.1%, 27% and 4.3% with the Aurora2 database for all utterances, (ii) 25.7%, 38.6% and 30.4% with the Aurora2 database for short utterances, and (iii) 18.7%, 12.6% and 2.5% with the Aurora4 database.
Keywords
Bayes methods; maximum likelihood estimation; speech recognition; Aurora2 database; Aurora4 database; Bayesian framework; CMVN; IVR; WER reduction; discriminable information loss; histogram equalization; improved cepstral mean and variance normalization; interactive voice response-based applications; maximum likelihood estimates; noise robust speech recognition; normalization technique; parameter estimation; short utterances; Bayes methods; Cepstral analysis; Databases; Hidden Markov models; Maximum likelihood estimation; Robustness; Training; Bayesian estimation; CMVN; HEQ; Robust speech recognition; VTS;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
Conference_Location
Olomouc
Type
conf
DOI
10.1109/ASRU.2013.6707722
Filename
6707722
Link To Document