• DocumentCode
    730714
  • Title

    Modeling long temporal contexts in convolutional neural network-based phone recognition

  • Author

    Toth, Laszlo

  • Author_Institution
    MTA-SZTE Res. Group on Artificial Intell., Univ. of Szeged, Szeged, Hungary
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    4575
  • Lastpage
    4579
  • Abstract
    The deep neural network component of current hybrid speech recognizers is trained on a context of consecutive feature vectors. Here, we investigate whether the time span of this input can be extended by splitting it up and modeling it in smaller chunks. One method for this is to train a hierarchy of two networks, while the less well-known split temporal context (STC) method models the left and right contexts of a frame separately. Here, we evaluate these techniques within a convolutional neural network framework, and find that the two approaches can be nicely combined. With the combined model we can expand the time-span of our network to 69 frames, and we achieve a 7.5% relative error rate reduction compared to modeling this large context as one block. We report a phone error rate of 17.1% on the TIMIT core test set, which is one of the best scores published.
  • Keywords
    convolution; learning (artificial intelligence); neural nets; speech recognition; vectors; STC method; TIMIT core test set; consecutive feature vectors; convolutional neural network-based phone recognition; current hybrid speech recognizers; deep neural network component; long temporal context modeling; phone error rate; relative error rate reduction; split temporal context method; Context; Context modeling; Convolution; Error analysis; Hidden Markov models; Neural networks; Speech recognition; Deep neural network; TIMIT; convolutional neural network; maxout; split temporal context;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178837
  • Filename
    7178837