DocumentCode :
1686465
Title :
Developing speech recognition systems for corpus indexing under the IARPA Babel program
Author :
Jia Cui ; Xiaodong Cui ; Ramabhadran, Bhuvana ; Kim, Jung-Ho ; Kingsbury, Brian ; Mamou, Jonathan ; Mangu, Lidia ; Picheny, Michael ; Sainath, Tara N. ; Sethy, Abhinav
Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
fYear :
2013
Firstpage :
6753
Lastpage :
6757
Abstract :
Automatic speech recognition is a core component of many applications, including keyword search. In this paper we describe experiments on acoustic modeling, language modeling, and decoding for keyword search on a Cantonese conversational telephony corpus collected as part of the IARPA Babel program. We show that acoustic modeling techniques such as the bootstrapped-and-restructured model and deep neural network acoustic model significantly outperform a state-of-the-art baseline GMM/HMM model, in terms of both recognition performance and keyword search performance, with improvements of up to 11% relative character error rate reduction and 31% relative maximum term weighted value improvement. We show that while an interpolated Model M and neural network LM improve recognition performance, they do not improve keyword search results; however, the advanced LM does reduce the size of the keyword search index. Finally, we show that a simple form of automatically adapted keyword search performs 16% better than a preindexed search system, indicating that out-of-vocabulary search is still a challenge.
Keywords :
Gaussian processes; acoustic signal processing; hidden Markov models; interpolation; natural language processing; neural nets; search problems; speech coding; speech recognition; telephony; Cantonese conversational telephony corpus; GMM model; HMM model; IARPA Babel program; acoustic modeling; automatic speech recognition; decoding; keyword search performance; language modeling; model M interpolation; neural network LM; out-of-vocabulary search; relative character error rate reduction; Acoustics; Adaptation models; Decoding; Hidden Markov models; Keyword search; Lattices; Training; acoustic modeling; bootstrap; deep learning; keyword search; language modeling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6638969
Filename :
6638969
Link To Document :
بازگشت