DocumentCode :
1288317
Title :
Stochastic trajectory modeling and sentence searching for continuous speech recognition
Author :
Gong, Yifan
Author_Institution :
CRIN, Vandoeuvre les Nancy, France
Volume :
5
Issue :
1
fYear :
1997
fDate :
1/1/1997 12:00:00 AM
Firstpage :
33
Lastpage :
44
Abstract :
The paper first points out a defect in hidden Markov modeling (HMM) of continuous speech, referred as trajectory folding phenomenon. A new approach to modeling phoneme-based speech units is then proposed, which represents the acoustic observations of a phoneme as clusters of trajectories in a parameter space. The trajectories are modeled by a mixture of probability density functions of a random sequence of states. Each state is associated with a multivariate Gaussian density function, optimized at the state sequence level. Conditional trajectory duration probability is integrated in the modeling. An efficient sentence search procedure based on trajectory modeling is also formulated. Experiments with a speaker-dependent, 2010-word continuous speech recognition application with a word-pair perplexity of 50, using vocabulary-independent acoustic training, monophone models trained with 80 sentences per speaker, reported about a 1% word error rate. The new models were experimentally compared to continuous density mixture HMM (CDHMM) on the same recognition task, and gave significantly smaller word error rates. These results suggest that the stochastic trajectory model provides a more in-depth modeling of continuous speech signals
Keywords :
Gaussian processes; acoustic signal processing; hidden Markov models; parameter estimation; probability; random processes; search problems; speech processing; speech recognition; CDHMM; acoustic observations; conditional trajectory duration probability; continuous density mixture HMM; continuous speech signals; experiments; hidden Markov modeling; monophone models; multivariate Gaussian density function; parameter space; phoneme based speech units modeling; probability density functions; random state sequence; sentence searching; speaker dependent continuous speech recognition; stochastic trajectory modeling; trajectory folding phenomenon; vocabulary-independent acoustic training; word error rate; word-pair perplexity; Erbium; Error analysis; Hidden Markov models; Loudspeakers; Probability density function; Random sequences; Speech recognition; Stochastic processes; Training data; Turning;
fLanguage :
English
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-6676
Type :
jour
DOI :
10.1109/89.554267
Filename :
554267
Link To Document :
بازگشت