Flexible Multi-Stream Framework for Speech Recognition using Multi-Tape Finite-State Transducers

Author

Hetherington, I. Lee ; Shu, Han ; Glass, James R.

Author_Institution

Comput. Sci. & Artificial Intelligence Lab., Massachusetts Inst. of Technol., Cambridge, MA

Volume

1

fYear

2006

fDate

14-19 May 2006

Abstract

We present an approach to general multi-stream recognition utilizing multi-tape finite-state transducers (FSTs). The approach is novel in that each of the multiple "streams\´" of features can represent either a sequence (e.g., fixed- or variable-rate frames) or a directed acyclic graph (e.g., containing hypothesized phonetic segmentations). Each transition of the multi-tape FST specifies the models to be applied to each stream and the degree of feature stream asynchrony to be allowed. We show how this framework can easily represent the 2-stream variable-rate landmark and segment modeling utilised by our baseline SUMMIT speech recognizer. We present experiments merging standard hidden Markov models (HMMs) with landmark models on the Wall Street Journal speech recognition task, and find that some degree of asynchrony can be critical when combining different types of models. We also present experiments performing audio-visual speech recognition on the AV-TIMIT task

Keywords

directed graphs; hidden Markov models; speech recognition; transducers; 2-stream variable-rate landmark; AV-TIMIT task; Wall Street Journal speech recognition task; audio-visual speech recognition; baseline SUMMIT speech recognizer; directed acyclic graph; feature stream asynchrony; fixed-rate frame; flexible multi-stream framework; general multi-stream recognition; hypothesized phonetic segmentations; multi-tape finite-state transducers; segment modeling; speech recognition; standard hidden Markov models; variable-rate frame; Artificial intelligence; Computer science; Context modeling; Glass; Hidden Markov models; Laboratories; Merging; Speech recognition; Streaming media; Transducers;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on

Conference_Location

Toulouse

ISSN

1520-6149

Print_ISBN

1-4244-0469-X

Type

conf

DOI

10.1109/ICASSP.2006.1660046

Filename

1660046