DocumentCode :
2424764
Title :
Rapid speaker adaptation for embedded large vocabulary dictation system with sparse training materials
Author :
Huang, Wei ; Zhang, Yaxin ; He, Xin ; Bao, Qingfeng
Author_Institution :
Motorola Labs. China Res. Center, Shanghai
fYear :
2008
fDate :
7-9 July 2008
Firstpage :
1069
Lastpage :
1072
Abstract :
In this paper, a novel tree-structural maximum a posteriori mapping (SMAP) algorithm is proposed for embedded large vocabulary speech dictation system. A two level tree is created by classing the Gaussian mixtures of a large HMM set into several classes, and an adaptation table is created for each class by MAP observed Gaussian mixture with adaptation data. Based on the adaptation table, the other unobserved Gaussian mixtures are rapidly adapted with negligible computation cost. The experiment results show that the SMAP is better than the conventional MAP estimation even with much less adaptation materials. We have implemented this algorithm on a cellular phone. For 40 short adaptation utterances (about 200 words), the total computation time for adaptation is reduced from about 10 minutes to less than 8 seconds.
Keywords :
Gaussian processes; dictation; hidden Markov models; speaker recognition; Gaussian mixtures; HMM; sparse training materials; speaker adaptation; tree-structural maximum a posteriori mapping; vocabulary dictation system; Biological materials; Computational efficiency; Helium; Hidden Markov models; Loudspeakers; Maximum likelihood linear regression; Partial response channels; Speech; Training data; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Audio, Language and Image Processing, 2008. ICALIP 2008. International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-1723-0
Electronic_ISBN :
978-1-4244-1724-7
Type :
conf
DOI :
10.1109/ICALIP.2008.4590110
Filename :
4590110
Link To Document :
بازگشت