Author :
Chang, Pao-Chung ; Chen, Sin-Horng ; Juang, Biing-hwang
Abstract :
In a traditional speech recognition system, the distance score between a test token and a reference pattern is obtained by simply averaging the distortion sequence resulted from the matching of the two patterns through a dynamic programming procedure. The final decision is made by choosing the one with the minimal average distance score. If one views the distortion sequence as a form of observed features, a decision rule based on a specific discriminant function designed for the distortion sequence obviously will perform better than that based on the simple average distortion. The authors therefore, suggest a linear discriminant function of the form ▵=Σ{i1}T w(i)* d(i) to compute the distance score ▵ instead of a direct average ▵=1/T Σ{i1}T d(i). Several adaptive algorithms are proposed to learn the discriminant weighting function. These include one heuristic method, two methods based on the error propagation algorithm, and one method based on the generalized probabilistic descent algorithm (GPD). They study these methods in a speaker-independent speech recognition task involving utterances of the highly confusible English E-set (b,c,d,e,g,p,t,v,z). The results show that the best performance is obtained by using the GPD-method which achieved a 78.1% accuracy, compared to 67.6% with the traditional unweighted average method. Besides the experimental comparisons, an analytical discussion of various training algorithms is also provided
Keywords :
speech recognition; English E-set; adaptive algorithms; decision rule; discriminant weighting function; discriminative analysis; distance score; distortion sequences; dynamic time warping algorithm; error propagation algorithm; generalized probabilistic descent algorithm; heuristic method; linear discriminant function; speaker-independent speech recognition; speech recognition; training algorithms; Adaptive algorithm; Distortion measurement; Dynamic programming; Hidden Markov models; Pattern matching; Pattern recognition; Speech analysis; Speech recognition; System testing; Vectors;