A Generic and Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator Using Logic on Memory

Author

Bapat, Ojas A. ; Franzon, Paul D. ; Fastow, Richard M.

Author_Institution

Spansion Inc., Sunnyvale, CA, USA

Volume

22

Issue

12

fYear

2014

fDate

Dec. 2014

Firstpage

2701

Lastpage

2712

Abstract

This paper describes a scalable hardware accelerator for speech recognition, which uses a two pass decoding algorithm with word dependent N-best Viterbi Beam Search. The observation probability calculation (Senone scoring) and first pass of decoding using a Bigram language model is implemented in hardware. The word lattice output from the first pass is used by software for the second pass, with a trigram language model. The proposed design uses a logic-on-memory approach to make use of high bandwidth nor flash memory to improve random read performance for Senone scoring and first pass decoding, both of which are memory intensive operations. The proposed HW/SW co-design achieves an overall speed up of 4.3X over a 2.4-GHz Intel Core 2 Duo processor running the CMU Sphinx speech recognition software, while consuming an estimated 1.72 W of power. The hardware accelerator provides improved speech recognition accuracy by supporting larger acoustic models and word dictionaries while maintaining real-time performance.

Keywords

hardware-software codesign; logic circuits; logic design; speech coding; speech recognition; Bigram language model; CMU Sphinx speech recognition software; HW/SW co-design; Intel Core 2 Duo processor; Senone scoring; acoustic models; first pass decoding; flash memory; frequency 2.4 GHz; generic scalable architecture; large acoustic model; large vocabulary speech recognition accelerator; logic-on-memory approach; memory intensive operations; observation probability calculation; power 1.72 W; read performance; scalable hardware accelerator; trigram language model; two pass decoding algorithm; word dependent N-best Viterbi beam search; word dictionaries; word lattice output; Acoustic beams; Acoustics; Decoding; Hardware; Hidden Markov models; Software; Speech recognition; Accelerator; N-best; beam search; embedded; hardware software co-design; logic on memory; multipass decoding; speech recognition; sphinx; sphinx.;

fLanguage

English

Journal_Title

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Publisher

ieee

ISSN

1063-8210

Type

jour

DOI

10.1109/TVLSI.2013.2296526

Filename

6718087