Title :
Discriminative acoustic model using eigenspace mapping for rapid speaker adaptation
Author :
Zhou, Bowen ; Hansen, John H. L.
Author_Institution :
Robust Speech Process. Group, Colorado Univ., Boulder, CO, USA
Abstract :
It is widely believed that strong correlations exist across an utterance as a consequence of time-invariant characteristics of speaker and acoustic environments. It is verified in this paper that the first primary eigendirections of the utterance covariance matrix are speaker dependent. Based on this observation, a fast speaker adaptation algorithm entitled Eigenspace Mapping (EigMap) is proposed and described. EigMap rapidly adapts the speaker independent models by constructing discriminative acoustic models in the test speaker´s eigenspace. Unsupervised adaptation experiments show that EigMap is effective in improving baseline models using very limited amounts of adaptation data with superior performance to conventional adaptation technique such as block diagonal MLLR. A relative improvement of 18.4% over baseline recognizer is achieved using EigMap with only about 4.5 seconds of adaptation data. It is also demonstrated that EigMap is additive to MLLR by encompassing the speaker dependent discrimination information. A significant relative improvement of 24.6% over baseline is observed by combining MLLR and EigMap techniques.
Keywords :
acoustic signal processing; covariance matrices; eigenvalues and eigenfunctions; speaker recognition; EigMap; MLLR; acoustic environment; baseline models; baseline recognizer; correlation; discriminative acoustic model; discriminative acoustic models; eigendirections; eigenspace mapping; fast speaker adaptation algorithm; rapid speaker adaptation; speaker dependent discrimination; speaker environment; speaker independent models; time-invariant characteristics; unsupervised adaptation experiments; utterance covariance matrix; Acoustic testing; Linear discriminant analysis; Loudspeakers; Maximum likelihood decoding; Maximum likelihood linear regression; Natural languages; Robustness; Speech processing; Speech recognition; Training data;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
Print_ISBN :
0-7803-7663-3
DOI :
10.1109/ICASSP.2003.1198779