A multiplatform speech recognition decoder based on weighted finite-state transducers

Author

Stoimenov, Emilian ; Schultz, Tanja

Author_Institution

Cognitive Syst. Labs., Univ. of Karlsruhe, Karlsruhe, Germany

fYear

2009

fDate

Nov. 13 2009-Dec. 17 2009

Firstpage

293

Lastpage

298

Abstract

Speech recognition decoders based on static graphs have recently proven to significantly outperform the traditional approach of prefix tree expansion in terms of decoding speed. The reduced search effort makes static graph decoders an attractive alternative for tasks concerned with limited processing power or memory footprint on devices such as PDAs, internet tablets, and smart phones. In this paper we explore the benefits of decoding with an optimized speech recognition network over the fully task-optimized prefix-tree based decoder IBIS. We designed and implemented a new decoder called SWIFT (speedy weigthed finite-state transducer) based on WFSTs with its application to embedded platforms in mind. After describing the design, the network construction and storage process, we present evaluation results on a small task suitable for embedded applications, and on a large task, namely the European Parliament Plenary Sessions (EPPS) task from the TC-STAR project. The SWIFT Decoder is up to 50% faster than IBIS on both tasks. In addition, SWIFT achieves significant memory consumption reductions obtained by our innovative network specific storage layout optimization.

Keywords

decoding; speech coding; speech recognition; European Parliament Plenary Sessions; PDA; internet tablets; multiplatform speech recognition decoder; network construction; prefix tree expansion; smart phones; speedy weigthed finite-state transducer; static graph decoders; storage process; weighted finite-state transducers; Acoustic testing; Context modeling; Decoding; Fixed-point arithmetic; Internet; Personal digital assistants; Smart phones; Speech recognition; Transducers; Tree graphs;

fLanguage

English

Publisher

ieee

Conference_Titel

Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on

Conference_Location

Merano

Print_ISBN

978-1-4244-5478-5

Electronic_ISBN

978-1-4244-5479-2

Type

conf

DOI

10.1109/ASRU.2009.5373404

Filename

5373404