DocumentCode
3162545
Title
Silence is golden: Modeling non-speech events in WFST-based dynamic network decoders
Author
Rybach, David ; Schlüter, Ralf ; Ney, Hermann
Author_Institution
Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany
fYear
2012
fDate
25-30 March 2012
Firstpage
4205
Lastpage
4208
Abstract
Models for silence are a fundamental part of continuous speech recognition systems. Depending on application requirements, audio data segmentation, and availability of detailed training data annotations, it may be necessary or beneficial to differentiate between other non-speech events, for example breath and background noise. The integration of multiple non-speech models in a WFST-based dynamic network decoder is not straightforward, because these models do not perfectly fit in the transducer framework. This paper describes several options for the transducer construction with multiple non-speech models, shows their considerable different characteristics in memory and runtime efficiency, and analyzes the impact on the recognition performance.
Keywords
decoding; speech recognition; WFST-based dynamic network decoders; audio data segmentation; nonspeech event model; recognition performance; runtime efficiency; transducer construction; transducer framework; Context; Decoding; Hidden Markov models; Noise; Speech; Speech recognition; Transducers; LVCSR; WFST;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location
Kyoto
ISSN
1520-6149
Print_ISBN
978-1-4673-0045-2
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2012.6288846
Filename
6288846
Link To Document