Title :
Simultaneous ANN feature and HMM recognizer design using string-based minimum classification error (MCE) training
Author :
Rahim, M.G. ; Lee, Chin-Hui
Author_Institution :
Res. Lab., AT&T Bell Labs., Murray Hill, NJ, USA
Abstract :
Conventional features used in state of the art hidden Markov model (HMM) based speech recognition systems are commonly inspired by scientific knowledge and expertise of the human vocal and auditory system. Although the intent when performing feature analysis is to extract “relevant” and “discriminative” information from the signal that is useful for speech recognition, this information may not be consistent with the objective of minimizing error rate in the recognition process. We utilize feedforward artificial neural networks (ANNs) to generate a new class of features for speech recognition. We propose a system for integrating the feature extraction process with the recognition process under a unified statistical framework with a consistent objective function that is designed to minimize recognition error rate. Results on a telephone based speaker independent connected digit task indicate that this integrated system with 12 ANNs is able to reduce the per digit error rate by a further 28% over a similar system using a single ANN and 16% over our previously best results in which feature transformation was not incorporated
Keywords :
errors; feature extraction; feedforward neural nets; hidden Markov models; learning (artificial intelligence); pattern classification; speech recognition; string matching; HMM based speech recognition systems; consistent objective function; discriminative information; error rate; feature analysis; feature extraction process; feature transformation; feedforward artificial neural networks; per digit error rate; recognition error rate; simultaneous ANN feature/HMM recognizer design; state of the art hidden Markov model; string based minimum classification error training; telephone based speaker independent connected digit task; unified statistical framework; Artificial neural networks; Auditory system; Error analysis; Hidden Markov models; Human voice; Information analysis; Performance analysis; Signal analysis; Speech analysis; Speech recognition;
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
DOI :
10.1109/ICSLP.1996.607985