DocumentCode :
2703664
Title :
Efficient, Low Latency Adaptation for Speech Recognition
Author :
Kozat, Suleyman S. ; Visweswariah, K. ; Gopinath, Rahul
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Volume :
4
fYear :
2007
fDate :
15-20 April 2007
Abstract :
Constrained or feature space maximum likelihood linear regression (FMLLR) is known to be an effective algorithm for adaptation to a new speaker or environment. It employs a single transformation matrix and bias vector to linearly transform the test speaker´s features. FMLLR makes no assumption on the underlying noise, environment or speaker and estimates parameters to maximize likelihood of the test data. The standard implementation needs considerable computational power, requires significant amounts of storage, and requires a first pass decoding before adaptation can begin. In this paper, we propose a simplified implementation of FMLLR for embedded applications to address these problems. Here, we employ a simple speech/silence segmentation to estimate parameters. We operate in the 13 dimensional cepstral space, hence resource requirements are low. The algorithm does not require a first pass decoding (parameter estimation is accomplished entirely in the front end) and can be applied with low latency as compared to FMLLR. The algorithms we describe here provide an attractive tradeoff between the power of FMLLR and the computational simplicity of Cepstral Mean Subtraction. With minimal cost, we achieve nearly 15% relative gains on an embedded speech recognition task.
Keywords :
decoding; maximum likelihood estimation; regression analysis; speech coding; speech recognition; cepstral space; decoding; embedded applications; maximum likelihood linear regression; speech recognition; speech-silence segmentation; Cepstral analysis; Costs; Delay; Maximum likelihood decoding; Maximum likelihood linear regression; Parameter estimation; Speech recognition; Testing; Vectors; Working environment noise; Speech enhancement; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
ISSN :
1520-6149
Print_ISBN :
1-4244-0727-3
Type :
conf
DOI :
10.1109/ICASSP.2007.367028
Filename :
4218216
Link To Document :
بازگشت