Implementing a high accuracy speaker-independent continuous speech recognizer on a fixed-point DSP

Author

Gong, Yfan ; Kao, Yu-Hung

Author_Institution

Texas Instrum. Inc., Dallas, TX, USA

Volume

6

fYear

2000

fDate

2000

Firstpage

3686

Abstract

Continuous speech recognition is a resource-intensive algorithm. Commercial dictation software requires more than 10 Mbytes to install on the disk and 32 Mbytes RAM to run the application. A typical embedded system can not afford this much RAM because of its high cost and power consumption; it also lacks disk to store the large amount of static data (e.g. acoustic models). We have been working on optimization of a small vocabulary speech recognizer suitable for implementation on a 16-bit fixed-point DSP. This recognizer supports sophisticated continuous density, tied-mixtures Gaussians, parallel model combination, and a noise-robust utterance detection algorithm. The fixed-point version achieves the same performance as the floating-point version. The algorithm runs real-time on a 100 MHz, 16-bit, fixed-point Texas Instruments TMS320C5410 even for the most challenging continuous digit dialing with hands-free microphone in driving conditions

Keywords

digital signal processing chips; fixed point arithmetic; microphones; random-access storage; signal detection; speech recognition; telephone sets; 100 MHz; 16 bit; RAM; Texas Instruments TMS320C5410; acoustic models; commercial dictation software; continuous density; continuous digit dialing; disk; driving conditions; embedded system; fixed-point DSP; hands-free microphone; high accuracy speech recognizer; noise-robust utterance detection algorithm; optimization; parallel model combination; performance; real-time algorithm; resource-intensive algorithm; small vocabulary speech recognizer; software design; speaker-independent continuous speech recognizer; tied-mixtures Gaussians; Application software; Costs; Digital signal processing; Embedded system; Energy consumption; Gaussian processes; Noise robustness; Power system modeling; Speech recognition; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on

Conference_Location

Istanbul

ISSN

1520-6149

Print_ISBN

0-7803-6293-4

Type

conf

DOI

10.1109/ICASSP.2000.860202

Filename

860202