Title :
WFST Enabled Solutions to ASR Problems: Beyond HMM Decoding
Author :
Hoffmeister, Björn ; Heigold, Georg ; Rybach, David ; Schlüter, Ralf ; Ney, Hermann
Author_Institution :
Yap, Inc., Charlotte, NC, USA
Abstract :
During the last decade, weighted finite-state transducers (WFSTs) have become popular in speech recognition. While their main field of application remains hidden Markov model (HMM) decoding, the WFST framework is now also seen as a brick in solutions to many other central problems in automatic speech recognition (ASR). These solutions are less known, and this work aims at giving an overview of the applications of WFSTs in large-vocabulary continuous speech recognition (LVCSR) besides HMM decoding: discriminative acoustic model training, Bayes risk decoding, and system combination. The application of the WFST framework has a big practical impact: we show how the framework helps to structure problems, to develop generic solutions, and to delegate complex computations to WFST toolkits. In this paper, we review the literature, discuss existing approaches, and provide new insights into WFST enabled solutions. We also present a novel, purely WFST-based algorithm for computing the exact Bayes risk hypothesis from a lattice with the Levenshtein distance as loss function. We present the problems and their solutions in a unified framework and discuss the advantages and limits of using WFSTs. We do not provide new experimental results, but refer to the existing literature. Our work helps to identify where and how the transducer framework can contribute to a compact and generic solution to LVCSR problems.
Keywords :
decoding; hidden Markov models; speech coding; speech recognition; ASR problems; Bayes risk decoding; HMM decoding; Levenshtein distance; WFST enabled solutions; automatic speech recognition; discriminative acoustic model training; hidden Markov model decoding; large vocabulary continuous speech recognition; loss function; system combination; weighted finite state transducers; Hidden Markov models; Lattices; Maximum likelihood decoding; Speech recognition; Training; Transducers; Bayes risk decoding; discriminative training; system combination; weighted finite-state transducer (WFST);
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2011.2162402