DocumentCode :
454585
Title :
Flexible Multi-Stream Framework for Speech Recognition using Multi-Tape Finite-State Transducers
Author :
Hetherington, I. Lee ; Shu, Han ; Glass, James R.
Author_Institution :
Comput. Sci. & Artificial Intelligence Lab., Massachusetts Inst. of Technol., Cambridge, MA
Volume :
1
fYear :
2006
fDate :
14-19 May 2006
Abstract :
We present an approach to general multi-stream recognition utilizing multi-tape finite-state transducers (FSTs). The approach is novel in that each of the multiple "streams\´" of features can represent either a sequence (e.g., fixed- or variable-rate frames) or a directed acyclic graph (e.g., containing hypothesized phonetic segmentations). Each transition of the multi-tape FST specifies the models to be applied to each stream and the degree of feature stream asynchrony to be allowed. We show how this framework can easily represent the 2-stream variable-rate landmark and segment modeling utilised by our baseline SUMMIT speech recognizer. We present experiments merging standard hidden Markov models (HMMs) with landmark models on the Wall Street Journal speech recognition task, and find that some degree of asynchrony can be critical when combining different types of models. We also present experiments performing audio-visual speech recognition on the AV-TIMIT task
Keywords :
directed graphs; hidden Markov models; speech recognition; transducers; 2-stream variable-rate landmark; AV-TIMIT task; Wall Street Journal speech recognition task; audio-visual speech recognition; baseline SUMMIT speech recognizer; directed acyclic graph; feature stream asynchrony; fixed-rate frame; flexible multi-stream framework; general multi-stream recognition; hypothesized phonetic segmentations; multi-tape finite-state transducers; segment modeling; speech recognition; standard hidden Markov models; variable-rate frame; Artificial intelligence; Computer science; Context modeling; Glass; Hidden Markov models; Laboratories; Merging; Speech recognition; Streaming media; Transducers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
Conference_Location :
Toulouse
ISSN :
1520-6149
Print_ISBN :
1-4244-0469-X
Type :
conf
DOI :
10.1109/ICASSP.2006.1660046
Filename :
1660046
Link To Document :
بازگشت