Dimensionality reduction methods for HMM phonetic recognition

Author

Hu, Hongbing ; Zahorian, Stephen A.

Author_Institution

Dept. of Electr. & Comput. Eng., Binghamton Univ., Binghamton, NY, USA

fYear

2010

fDate

14-19 March 2010

Firstpage

4854

Lastpage

4857

Abstract

This paper presents two nonlinear feature dimensionality reduction methods based on neural networks for a HMM-based phone recognition system. The neural networks are trained as feature classifiers to reduce feature dimensionality as well as maximize discrimination among speech features. The outputs of different network layers are used for obtaining transformed features. Moreover, the training of the neural networks uses the category information that corresponds to a state in HMMs so that the trained networks can better accommodate the temporal variability of features and obtain more discriminative features in a low dimensional space. Experimental evaluation using the TIMIT database shows that recognition accuracies with the transformed features are slightly higher than those obtained with original features and considerably higher than obtained with linear dimensionality reduction methods. The highest phone accuracy obtained with 39 phone classes and TIMIT was 74.9% using a large number of training iterations based on the state-specific targets.

Keywords

feature extraction; hidden Markov models; neural nets; pattern classification; speech processing; speech recognition; HMM phonetic recognition; HMM- based phone recognition system; TIMIT database; feature classifier; linear dimensionality reduction method; low dimensional space; neural network; nonlinear feature dimensionality reduction method; speech feature; temporal features variability; Hidden Markov models; Linear discriminant analysis; Multi-layer neural network; Neural networks; Principal component analysis; Spatial databases; Speech recognition; State estimation; HMMs; dimensionality reduction; neural networks; nonlinear discriminant analysis;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on

Conference_Location

Dallas, TX

ISSN

1520-6149

Print_ISBN

978-1-4244-4295-9

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2010.5495130

Filename

5495130