Title :
A Generic and Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator Using Logic on Memory
Author :
Bapat, Ojas A. ; Franzon, Paul D. ; Fastow, Richard M.
Author_Institution :
Spansion Inc., Sunnyvale, CA, USA
Abstract :
This paper describes a scalable hardware accelerator for speech recognition, which uses a two pass decoding algorithm with word dependent N-best Viterbi Beam Search. The observation probability calculation (Senone scoring) and first pass of decoding using a Bigram language model is implemented in hardware. The word lattice output from the first pass is used by software for the second pass, with a trigram language model. The proposed design uses a logic-on-memory approach to make use of high bandwidth nor flash memory to improve random read performance for Senone scoring and first pass decoding, both of which are memory intensive operations. The proposed HW/SW co-design achieves an overall speed up of 4.3X over a 2.4-GHz Intel Core 2 Duo processor running the CMU Sphinx speech recognition software, while consuming an estimated 1.72 W of power. The hardware accelerator provides improved speech recognition accuracy by supporting larger acoustic models and word dictionaries while maintaining real-time performance.
Keywords :
hardware-software codesign; logic circuits; logic design; speech coding; speech recognition; Bigram language model; CMU Sphinx speech recognition software; HW/SW co-design; Intel Core 2 Duo processor; Senone scoring; acoustic models; first pass decoding; flash memory; frequency 2.4 GHz; generic scalable architecture; large acoustic model; large vocabulary speech recognition accelerator; logic-on-memory approach; memory intensive operations; observation probability calculation; power 1.72 W; read performance; scalable hardware accelerator; trigram language model; two pass decoding algorithm; word dependent N-best Viterbi beam search; word dictionaries; word lattice output; Acoustic beams; Acoustics; Decoding; Hardware; Hidden Markov models; Software; Speech recognition; Accelerator; N-best; beam search; embedded; hardware software co-design; logic on memory; multipass decoding; speech recognition; sphinx; sphinx.;
Journal_Title :
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
DOI :
10.1109/TVLSI.2013.2296526