DocumentCode :
394231
Title :
Conversational telephone speech recognition
Author :
Gauvain, J.L. ; Lamel, L. ; Schwenk, H. ; Adda, G. ; Chen, L. ; Lefèvre, F.
Author_Institution :
Spoken Language Process. Group, LIMSI-CNRS, Orsay, France
Volume :
1
fYear :
2003
fDate :
6-10 April 2003
Abstract :
This paper describes the development of a speech recognition system for the processing of telephone conversations, starting with a state-of-the-art broadcast news transcription system. We identify major changes and improvements in acoustic and language modeling, as well as decoding, which are required to achieve state-of-the-art performance on conversational speech. Some major changes on the acoustic side include the use of speaker normalization (VTLN), the need to cope with channel variability, and the need for efficient speaker adaptation and better pronunciation modeling. On the linguistic side the primary challenge is to cope with the limited amount of language model training data. To address this issue we make use of a data selection technique, and a smoothing technique based on a neural network language model. At the decoding level lattice rescoring and minimum word error decoding are applied. On the development data, the improvements yield an overall word error rate of 24.9% whereas the original BN transcription system had a word error rate of about 50% on the same data.
Keywords :
decoding; neural nets; normalising; speech recognition; BN transcription system; acoustic modeling; broadcast news transcription system; channel variability; data selection technique; decoding; language model training data; language modeling; lattice rescoring; minimum word error decoding; neural network language model; pronunciation modeling; smoothing technique; speaker adaptation; speaker normalization; speech recognition system; telephone conversations; word error rate; Broadcasting; Decoding; Error analysis; Loudspeakers; Natural languages; Neural networks; Smoothing methods; Speech recognition; Telephony; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-7663-3
Type :
conf
DOI :
10.1109/ICASSP.2003.1198755
Filename :
1198755
Link To Document :
بازگشت