DocumentCode :
394253
Title :
Discriminative acoustic model using eigenspace mapping for rapid speaker adaptation
Author :
Zhou, Bowen ; Hansen, John H. L.
Author_Institution :
Robust Speech Process. Group, Colorado Univ., Boulder, CO, USA
Volume :
1
fYear :
2003
fDate :
6-10 April 2003
Abstract :
It is widely believed that strong correlations exist across an utterance as a consequence of time-invariant characteristics of speaker and acoustic environments. It is verified in this paper that the first primary eigendirections of the utterance covariance matrix are speaker dependent. Based on this observation, a fast speaker adaptation algorithm entitled Eigenspace Mapping (EigMap) is proposed and described. EigMap rapidly adapts the speaker independent models by constructing discriminative acoustic models in the test speaker´s eigenspace. Unsupervised adaptation experiments show that EigMap is effective in improving baseline models using very limited amounts of adaptation data with superior performance to conventional adaptation technique such as block diagonal MLLR. A relative improvement of 18.4% over baseline recognizer is achieved using EigMap with only about 4.5 seconds of adaptation data. It is also demonstrated that EigMap is additive to MLLR by encompassing the speaker dependent discrimination information. A significant relative improvement of 24.6% over baseline is observed by combining MLLR and EigMap techniques.
Keywords :
acoustic signal processing; covariance matrices; eigenvalues and eigenfunctions; speaker recognition; EigMap; MLLR; acoustic environment; baseline models; baseline recognizer; correlation; discriminative acoustic model; discriminative acoustic models; eigendirections; eigenspace mapping; fast speaker adaptation algorithm; rapid speaker adaptation; speaker dependent discrimination; speaker environment; speaker independent models; time-invariant characteristics; unsupervised adaptation experiments; utterance covariance matrix; Acoustic testing; Linear discriminant analysis; Loudspeakers; Maximum likelihood decoding; Maximum likelihood linear regression; Natural languages; Robustness; Speech processing; Speech recognition; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-7663-3
Type :
conf
DOI :
10.1109/ICASSP.2003.1198779
Filename :
1198779
Link To Document :
بازگشت