DocumentCode
2791669
Title
Dimensionality reduction methods for HMM phonetic recognition
Author
Hu, Hongbing ; Zahorian, Stephen A.
Author_Institution
Dept. of Electr. & Comput. Eng., Binghamton Univ., Binghamton, NY, USA
fYear
2010
fDate
14-19 March 2010
Firstpage
4854
Lastpage
4857
Abstract
This paper presents two nonlinear feature dimensionality reduction methods based on neural networks for a HMM-based phone recognition system. The neural networks are trained as feature classifiers to reduce feature dimensionality as well as maximize discrimination among speech features. The outputs of different network layers are used for obtaining transformed features. Moreover, the training of the neural networks uses the category information that corresponds to a state in HMMs so that the trained networks can better accommodate the temporal variability of features and obtain more discriminative features in a low dimensional space. Experimental evaluation using the TIMIT database shows that recognition accuracies with the transformed features are slightly higher than those obtained with original features and considerably higher than obtained with linear dimensionality reduction methods. The highest phone accuracy obtained with 39 phone classes and TIMIT was 74.9% using a large number of training iterations based on the state-specific targets.
Keywords
feature extraction; hidden Markov models; neural nets; pattern classification; speech processing; speech recognition; HMM phonetic recognition; HMM- based phone recognition system; TIMIT database; feature classifier; linear dimensionality reduction method; low dimensional space; neural network; nonlinear feature dimensionality reduction method; speech feature; temporal features variability; Hidden Markov models; Linear discriminant analysis; Multi-layer neural network; Neural networks; Principal component analysis; Spatial databases; Speech recognition; State estimation; HMMs; dimensionality reduction; neural networks; nonlinear discriminant analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location
Dallas, TX
ISSN
1520-6149
Print_ISBN
978-1-4244-4295-9
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2010.5495130
Filename
5495130
Link To Document