• DocumentCode
    3752152
  • Title

    Improving bottleneck features for automatic speech recognition using gammatone-based cochleagram and sparsity regularization

  • Author

    Chao Ma;Jun Qi;Dongmei Li;Runsheng Liu

  • Author_Institution
    Department of Electronic Engineering, Tsinghua University, Beijing, China, 100084
  • fYear
    2015
  • Firstpage
    63
  • Lastpage
    67
  • Abstract
    Bottleneck (BN) features, particularly based on deep structures of a neural network, have been successfully applied to Automatic Speech Recognition (ASR) tasks. This paper goes on the study of improving the BN features for ASR tasks by employing two different methods: (1) a Cochleagram generated by Gammatone filters as the input feature for a deep neural network; (2) imposing the sparsity regularization on the bottleneck layer to control the sparsity level of BN features by constraining the activations of the hidden units to be averagely inactive most of the time. Our experiments on the Wall Street Journal (WSJ) database demonstrate that the two approaches can deliver certain performance gains to BN features for ASR tasks. In addition, further experiments on the WSJ database from different noise levels show that the Cochleagram as input has better noise-robust performance than the commonly used Mel-scaled filterbank.
  • Keywords
    "Neural networks","Frequency modulation","Indexes","Cost function","Automatic speech recognition","Mel frequency cepstral coefficient"
  • Publisher
    ieee
  • Conference_Titel
    Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
  • Type

    conf

  • DOI
    10.1109/APSIPA.2015.7415401
  • Filename
    7415401