• DocumentCode
    855017
  • Title

    Robust combination of neural networks and hidden Markov models for speech recognition

  • Author

    Trentin, Edmondo ; Gori, Marco

  • Volume
    14
  • Issue
    6
  • fYear
    2003
  • Firstpage
    1519
  • Lastpage
    1531
  • Abstract
    Acoustic modeling in state-of-the-art speech recognition systems usually relies on hidden Markov models (HMMs) with Gaussian emission densities. HMMs suffer from intrinsic limitations, mainly due to their arbitrary parametric assumption. Artificial neural networks (ANNs) appear to be a promising alternative in this respect, but they historically failed as a general solution to the acoustic modeling problem. This paper introduces algorithms based on a gradient-ascent technique for global training of a hybrid ANN/HMM system, in which the ANN is trained for estimating the emission probabilities of the states of the HMM. The approach is related to the major hybrid systems proposed by Bourlard and Morgan and by Bengio, with the aim of combining their benefits within a unified framework and to overcome their limitations. Several viable solutions to the "divergence problem"-that may arise when training is accomplished over the maximum-likelihood (ML) criterion-are proposed. Experimental results in speaker-independent, continuous speech recognition over Italian digit-strings validate the novel hybrid framework, allowing for improved recognition performance over HMMs with mixtures of Gaussian components, as well as over Bourlard and Morgan\´s paradigm. In particular, it is shown that the maximum a posteriori (MAP) version of the algorithm yields a 46.34% relative word error rate reduction with respect to standard HMMs.
  • Keywords
    Gaussian processes; gradient methods; hidden Markov models; learning (artificial intelligence); maximum likelihood estimation; neural nets; optimisation; speech recognition; Bourlard and Morgan; Gaussian emission density; arbitrary parametric assumption; artificial neural network; divergence problem; emission probabilities; global optimization; gradient-ascent technique; hidden Markov model; maximum a posteriori; maximum-likelihood criterion; neural network; robust combination; speech recognition; Acoustic emission; Artificial neural networks; Automatic speech recognition; Dictionaries; Hidden Markov models; Maximum likelihood estimation; Neural networks; Robustness; Speech recognition; State estimation;
  • fLanguage
    English
  • Journal_Title
    Neural Networks, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9227
  • Type

    jour

  • DOI
    10.1109/TNN.2003.820838
  • Filename
    1257414