DocumentCode :
2176258
Title :
A multi-stream ASR framework for BLSTM modeling of conversational speech
Author :
Wöllmer, Martin ; Eyben, Florian ; Schuller, Björn ; Rigoll, Gerhard
Author_Institution :
Inst. for Human-Machine Commun., Tech. Univ. Munchen, Munich, Germany
fYear :
2011
fDate :
22-27 May 2011
Firstpage :
4860
Lastpage :
4863
Abstract :
We propose a novel multi-stream framework for continuous conversational speech recognition which employs bidirectional Long Short-Term Memory (BLSTM) networks for phoneme prediction. The BLSTM architecture allows recurrent neural nets to model long range context, which led to improved ASR performance when combined with conventional triphone modeling in a Tandem system. In this paper, we extend the principle of joint BLSTM and triphone modeling to a multi-stream system which uses MFCC features and BLSTM predictions as observations originating from two independent data streams. Using the COSINE database, we show that this technique prevails over a recently proposed single-stream Tandem system as well as over a conventional HMM recognizer.
Keywords :
speech recognition; BLSTM modeling; COSINE database; HMM recognizer; bidirectional long short-term memory networks; continuous conversational speech recognition; conversational speech; independent data streams; multistream ASR framework; single-stream tandem system; triphone modeling; Computer architecture; Context; Hidden Markov models; Logic gates; Recurrent neural networks; Speech; Speech recognition; Context Modeling; Conversational Speech Recognition; Long Short-Term Memory; Recurrent Neural Networks;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location :
Prague
ISSN :
1520-6149
Print_ISBN :
978-1-4577-0538-0
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2011.5947444
Filename :
5947444
Link To Document :
بازگشت