DocumentCode
417209
Title
Feature generation based on maximum normalized acoustic likelihood for improved speech recognition
Author
Li, Xiang ; Stern, Richard M.
Author_Institution
Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA
Volume
1
fYear
2004
fDate
17-21 May 2004
Abstract
Feature representation is a very important factor that has a great effect on the performance of speech recognition systems. In this paper we focus on a feature generation process that is based on the linear transformation of an original log-spectral representation. While conventional linear feature generation methods generally use objective functions that are not closely related to recognition accuracy, our linear feature generation method attempts to find a transformation matrix that maximizes the normalized acoustic likelihood of the most likely state training data, a measure that is directly related to the classification error rate in speech recognition. The transformation matrix is generated using a gradient ascent optimization process, with the normalized acoustic likelihood of the most likely state sequence as the objective function. Experimental results using the DARPA RM corpus show that the proposed method consistently decreases word error rates compared to conventional linear feature generation methods.
Keywords
error statistics; feature extraction; gradient methods; matrix algebra; maximum likelihood sequence estimation; signal representation; spectral analysis; speech recognition; state estimation; DARPA RM corpus; classification error rate; feature generation; feature representation; gradient ascent optimization process; improved speech recognition; linear transformation; log-spectral representation; maximum normalized acoustic likelihood; most likely state sequence; performance; state training data; transformation matrix; word error rates; Closed-form solution; Computer science; Covariance matrix; Iterative methods; Maximum likelihood estimation; Optimization methods; Partitioning algorithms; Probability distribution; Speech recognition; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-8484-9
Type
conf
DOI
10.1109/ICASSP.2004.1326043
Filename
1326043
Link To Document