• DocumentCode
    2791669
  • Title

    Dimensionality reduction methods for HMM phonetic recognition

  • Author

    Hu, Hongbing ; Zahorian, Stephen A.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Binghamton Univ., Binghamton, NY, USA
  • fYear
    2010
  • fDate
    14-19 March 2010
  • Firstpage
    4854
  • Lastpage
    4857
  • Abstract
    This paper presents two nonlinear feature dimensionality reduction methods based on neural networks for a HMM-based phone recognition system. The neural networks are trained as feature classifiers to reduce feature dimensionality as well as maximize discrimination among speech features. The outputs of different network layers are used for obtaining transformed features. Moreover, the training of the neural networks uses the category information that corresponds to a state in HMMs so that the trained networks can better accommodate the temporal variability of features and obtain more discriminative features in a low dimensional space. Experimental evaluation using the TIMIT database shows that recognition accuracies with the transformed features are slightly higher than those obtained with original features and considerably higher than obtained with linear dimensionality reduction methods. The highest phone accuracy obtained with 39 phone classes and TIMIT was 74.9% using a large number of training iterations based on the state-specific targets.
  • Keywords
    feature extraction; hidden Markov models; neural nets; pattern classification; speech processing; speech recognition; HMM phonetic recognition; HMM- based phone recognition system; TIMIT database; feature classifier; linear dimensionality reduction method; low dimensional space; neural network; nonlinear feature dimensionality reduction method; speech feature; temporal features variability; Hidden Markov models; Linear discriminant analysis; Multi-layer neural network; Neural networks; Principal component analysis; Spatial databases; Speech recognition; State estimation; HMMs; dimensionality reduction; neural networks; nonlinear discriminant analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-4295-9
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2010.5495130
  • Filename
    5495130