Title :
Light supervision in acoustic model training
Author :
Nguyen, Long ; Xiang, Bing
Author_Institution :
BBN Technol., Cambridge, MA, USA
Abstract :
We present a new light supervision method to derive additional acoustic training data automatically for broadcast news transcription systems. A subset of the TDT corpus, which consists of broadcast audio with corresponding closed-caption (CC) transcripts, is identified by aligning the CC transcripts and the hypotheses generated by lightly-supervised decoding. Phrases of three or more contiguous words, on which both the CC transcripts and the decoder´s hypotheses agree, are selected. The selection yields 702 hours, or 72% of the captioned data. When adding 700 hours of selected data to the baseline 141 hour broadcast news training data set, we achieved a 13% relative word error rate reduction. The key to the effectiveness of this light supervision method is the use of a biased language model (LM) in the lightly supervised decoding. The biased LM, in which the CC transcripts are added with heavy weighting, helps in selecting words the recognizer could have misrecognized if using a fair LM.
Keywords :
acoustic signal processing; error statistics; learning (artificial intelligence); natural languages; speech recognition; text analysis; acoustic model training; biased language model; broadcast news transcription; closed-caption transcripts; large vocabulary speech recognition systems; light supervision; lightly-supervised decoding; word error rate; Broadcast technology; Cellular neural networks; Decoding; Error analysis; Niobium compounds; Radio broadcasting; Speech; TV broadcasting; Training data; Vocabulary;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
Print_ISBN :
0-7803-8484-9
DOI :
10.1109/ICASSP.2004.1325953