DocumentCode :
70615
Title :
A Generic and Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator Using Logic on Memory
Author :
Bapat, Ojas A. ; Franzon, Paul D. ; Fastow, Richard M.
Author_Institution :
Spansion Inc., Sunnyvale, CA, USA
Volume :
22
Issue :
12
fYear :
2014
fDate :
Dec. 2014
Firstpage :
2701
Lastpage :
2712
Abstract :
This paper describes a scalable hardware accelerator for speech recognition, which uses a two pass decoding algorithm with word dependent N-best Viterbi Beam Search. The observation probability calculation (Senone scoring) and first pass of decoding using a Bigram language model is implemented in hardware. The word lattice output from the first pass is used by software for the second pass, with a trigram language model. The proposed design uses a logic-on-memory approach to make use of high bandwidth nor flash memory to improve random read performance for Senone scoring and first pass decoding, both of which are memory intensive operations. The proposed HW/SW co-design achieves an overall speed up of 4.3X over a 2.4-GHz Intel Core 2 Duo processor running the CMU Sphinx speech recognition software, while consuming an estimated 1.72 W of power. The hardware accelerator provides improved speech recognition accuracy by supporting larger acoustic models and word dictionaries while maintaining real-time performance.
Keywords :
hardware-software codesign; logic circuits; logic design; speech coding; speech recognition; Bigram language model; CMU Sphinx speech recognition software; HW/SW co-design; Intel Core 2 Duo processor; Senone scoring; acoustic models; first pass decoding; flash memory; frequency 2.4 GHz; generic scalable architecture; large acoustic model; large vocabulary speech recognition accelerator; logic-on-memory approach; memory intensive operations; observation probability calculation; power 1.72 W; read performance; scalable hardware accelerator; trigram language model; two pass decoding algorithm; word dependent N-best Viterbi beam search; word dictionaries; word lattice output; Acoustic beams; Acoustics; Decoding; Hardware; Hidden Markov models; Software; Speech recognition; Accelerator; N-best; beam search; embedded; hardware software co-design; logic on memory; multipass decoding; speech recognition; sphinx; sphinx.;
fLanguage :
English
Journal_Title :
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-8210
Type :
jour
DOI :
10.1109/TVLSI.2013.2296526
Filename :
6718087
Link To Document :
بازگشت