Title :
A modular RNN-based method for continuous Mandarin speech recognition
Author :
Liao, Yuan-Fu ; Chen, Sin-Horng
Author_Institution :
Dept. of Commun. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan
fDate :
3/1/2001 12:00:00 AM
Abstract :
A new modular recurrent neural network (MRNN)-based method for continuous Mandarin speech recognition (CMSR) is proposed. The MRNN recognizer is composed of four main modules. The first is a sub-MRNN module whose function is to generate discriminant functions for all 412 base-syllables. It accomplishes the task by using four recurrent neural network (RNN) submodules. The second is an RNN module which is designed to detect syllable boundaries for providing timing cues in order to help solve the time-alignment problem. The third is also an RNN module whose function is to generate discriminant functions for 143 intersyllable diphone-like units to compensate the intersyllable coarticulation effect. The fourth is a dynamic programming (DP)-based recognition search module. Its function is to integrate the other three modules and solve the time-alignment problem for generating the recognized base-syllable sequence. A new multilevel pruning scheme designed to speed up the recognition process is also proposed. The whole MRNN can be trained by a sophisticated three-stage minimum classification error/generalized probabilistic descent (MCE/GPD) algorithm. Experimental results showed that the proposed method performed better than the maximum likelihood (ML)-trained hidden Markov model (HMM) method and is comparable to the MCE/GPD-trained HMM method. The multilevel pruning scheme was also found to be very efficient
Keywords :
dynamic programming; natural languages; recurrent neural nets; search problems; speech recognition; Chinese; MCE/GPD algorithm; MRNN recognizer; base-syllable sequence; base-syllables; continuous Mandarin speech recognition; discriminant functions; dynamic programming-based recognition search module; intersyllable coarticulation effect; intersyllable diphone-like units; modular RNN-based method; modular recurrent neural network; multilevel pruning scheme; sub-MRNN module; submodules; syllable boundaries; three-stage minimum classification error/generalized probabilistic descent algorithm; time-alignment; Artificial neural networks; Context modeling; Dynamic programming; Hidden Markov models; Maximum likelihood detection; Multilayer perceptrons; Pattern recognition; Recurrent neural networks; Speech recognition; Timing;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on