DocumentCode
418166
Title
Scalable architecture for word HMM-based speech recognition
Author
Yoshizawa, Shingo ; Wada, Naoya ; Hayasaka, Noboru ; Miyanaga, Yoshikazu
Author_Institution
Graduate Sch. of Eng., Hokkaido Univ., Sapporo, Japan
Volume
3
fYear
2004
fDate
23-26 May 2004
Abstract
This paper presents a scalable architecture for realizing real-time speech recognizers based on a word HMM (hidden Markov model). HMM-based recognition algorithms are classified into two acoustic models, i.e., phenome-level model and word-level model. The phenome-level HMM has been widely used in current speech recognition systems which permit large-sized vocabularies. Whereas the word-level HMM has been constrained to small-sized vocabularies because of extremely high computation cost in spite of excellent recognition performance. In order to overcome the shortage, we adopt the scalable architecture focused on the word HMM structure. The proposed architecture can flexibly improve recognition performance and extend word vocabularies. In addition, the computation time is hardly increasing. In order to demonstrate practical solutions, we have designed and evaluated a total system recognizer including speech analysis and noise robustness on a 0.18 μm CMOS standard cell library. The recognition time is 35.7 μs/word at 128 MHz operating frequency. The recognizer can achieve over middle-sized vocabularies in real-time response.
Keywords
CMOS integrated circuits; audio signal processing; hidden Markov models; real-time systems; speech recognition; vocabulary; 01.8 micron; 128 MHz; CMOS standard cell library; HMM-based recognition algorithm; acoustic model; hidden Markov model; noise robustness; operating frequency; phenome-level HMM; real-time speech recognizer; recognition time; scalable speech recognition architecture; speech analysis; speech recognition system; word HMM-based speech recognition; word vocabulary extension; word-level HMM; Computational efficiency; Computer architecture; Frequency; Hidden Markov models; High performance computing; Libraries; Noise robustness; Speech analysis; Speech recognition; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Circuits and Systems, 2004. ISCAS '04. Proceedings of the 2004 International Symposium on
Print_ISBN
0-7803-8251-X
Type
conf
DOI
10.1109/ISCAS.2004.1328772
Filename
1328772
Link To Document