DocumentCode :
2876004
Title :
A long-contextual-span model of resonance dynamics for speech recognition: parameter learning and recognizer evaluation
Author :
Deng, Li ; Yu, Dong ; Li, Xiaolong ; Acero, Alex
Author_Institution :
Microsoft Res., Redmond, WA
fYear :
2005
fDate :
27-27 Nov. 2005
Firstpage :
145
Lastpage :
150
Abstract :
We present a structured speech model that is equipped with the capability of jointly representing incomplete articulation and long-span co-articulation in natural human speech. Central to this model is compact statistical parameterization of the highly regular dynamic patterns (exhibited in the hidden vocal-tract-resonance domain) that are driven by the stochastic segmental targets. We provide a rigorous mathematical description of this model, and present novel algorithms for learning the full set of model parameters using the cepstral data of speech. In particular, the gradient ascend techniques for learning variance parameters (for both resonance targets and cepstral prediction residuals) are described in detail. Phonetic recognition experiments are carried out using two paradigms of N-best rescoring and lattice search. Both sets of results demonstrate higher recognition accuracy achieved by the new model compared with the best HMM system. The higher accuracy is consistently observed, with and without combining HMM scores, and with and without including the references in the N-best lists and lattices. Further, the new model with rich parameter-free structure uses only the context-independent, single-modal Gaussian parameters, which are fewer than one percent of the parameters in the context-dependent HMM system with mixture distributions
Keywords :
Gaussian processes; gradient methods; hidden Markov models; speech recognition; HMM system; N-best rescoring; gradient ascend techniques; hidden vocal-tract-resonance domain; lattice search; long-contextual-span model; natural human speech; parameter learning; phonetic recognition; recognizer evaluation; resonance dynamics; single-modal Gaussian parameters; speech recognition; Context modeling; Hidden Markov models; Humans; Psychoacoustic models; Resonance; Speech analysis; Speech recognition; Stochastic processes; Vectors; Video recording;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on
Conference_Location :
San Juan
Print_ISBN :
0-7803-9478-X
Electronic_ISBN :
0-7803-9479-8
Type :
conf
DOI :
10.1109/ASRU.2005.1566534
Filename :
1566534
Link To Document :
بازگشت