• DocumentCode
    310460
  • Title

    Acoustic model building based on non-uniform segments and bidirectional recurrent neural networks

  • Author

    Schuster, Mike

  • Author_Institution
    ATR Interpreting Telephony Res. Labs., Kyoto, Japan
  • Volume
    4
  • fYear
    1997
  • fDate
    21-24 Apr 1997
  • Firstpage
    3249
  • Abstract
    A new framework for acoustic model building is presented. It is based on non-uniform segment models, which are learned and scored with a time bidirectional recurrent neural network. While usually neural networks in speech recognition systems are used to estimate posterior “frame to phoneme” probabilities, they are used here to estimate directly “segment to phoneme” probabilities, which results in an improved duration model. The special MAP approach allows not only incorporation of long term dependencies on the acoustic side, but also on the phone (output) side, which results automatically in parameter efficient context dependent models. While the use of neural networks as frame or phoneme classifiers always results in discriminative training for the acoustic information, the MAP approach presented also incorporates discriminative training for the internally learned phoneme language model. Classification tests for the TIMIT phoneme database gave promising results of 77.75 (82.38)% for the full test data set with all 61(39) symbols
  • Keywords
    acoustic signal processing; feature extraction; learning (artificial intelligence); maximum likelihood estimation; pattern classification; recurrent neural nets; speech processing; speech recognition; TIMIT phoneme database; acoustic model building; bidirectional recurrent neural networks; classification tests; discriminative training; duration model; feature extraction; frame classifiers; long term dependencies; nonuniform segments; parameter efficient context dependent models; phoneme classifiers; phoneme language model; segment to phoneme probabilities; speech recognition; speech recognition systems; test data set; Acoustic testing; Databases; Error analysis; Merging; Neural networks; Pattern recognition; Probability; Recurrent neural networks; Speech recognition; Statistical analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
  • Conference_Location
    Munich
  • ISSN
    1520-6149
  • Print_ISBN
    0-8186-7919-0
  • Type

    conf

  • DOI
    10.1109/ICASSP.1997.595486
  • Filename
    595486