مرکز منطقه ای اطلاع رساني علوم و فناوري - Feature generation based on maximum normalized acoustic likelihood for improved speech recognition

DocumentCode :

417209

Title :

Feature generation based on maximum normalized acoustic likelihood for improved speech recognition

Author :

Li, Xiang ; Stern, Richard M.

Author_Institution :

Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA

Volume :

fYear :

2004

fDate :

17-21 May 2004

Abstract :

Feature representation is a very important factor that has a great effect on the performance of speech recognition systems. In this paper we focus on a feature generation process that is based on the linear transformation of an original log-spectral representation. While conventional linear feature generation methods generally use objective functions that are not closely related to recognition accuracy, our linear feature generation method attempts to find a transformation matrix that maximizes the normalized acoustic likelihood of the most likely state training data, a measure that is directly related to the classification error rate in speech recognition. The transformation matrix is generated using a gradient ascent optimization process, with the normalized acoustic likelihood of the most likely state sequence as the objective function. Experimental results using the DARPA RM corpus show that the proposed method consistently decreases word error rates compared to conventional linear feature generation methods.

Keywords :

error statistics; feature extraction; gradient methods; matrix algebra; maximum likelihood sequence estimation; signal representation; spectral analysis; speech recognition; state estimation; DARPA RM corpus; classification error rate; feature generation; feature representation; gradient ascent optimization process; improved speech recognition; linear transformation; log-spectral representation; maximum normalized acoustic likelihood; most likely state sequence; performance; state training data; transformation matrix; word error rates; Closed-form solution; Computer science; Covariance matrix; Iterative methods; Maximum likelihood estimation; Optimization methods; Partitioning algorithms; Probability distribution; Speech recognition; Training data;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on

ISSN :

1520-6149

Print_ISBN :

0-7803-8484-9

Type :

conf

DOI :

10.1109/ICASSP.2004.1326043

Filename :

1326043

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=417209