• DocumentCode
    178399
  • Title

    Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions

  • Author

    Thomas, Stephan ; Ganapathy, Shrikanth ; Saon, George ; Soltau, Hagen

  • Author_Institution
    IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    2519
  • Lastpage
    2523
  • Abstract
    Convolutional neural networks (CNN) are extensions to deep neural networks (DNN) which are used as alternate acoustic models with state-of-the-art performances for speech recognition. In this paper, CNNs are used as acoustic models for speech activity detection (SAD) on data collected over noisy radio communication channels. When these SAD models are tested on audio recorded from radio channels not seen during training, there is severe performance degradation. We attribute this degradation to mismatches between the two dimensional filters learnt in the initial CNN layers and the novel channel data. Using a small amount of supervised data from the novel channels, the filters can be adapted to provide significant improvements in SAD performance. In mismatched acoustic conditions, the adapted models provide significant improvements (about 10-25%) relative to conventional DNN-based SAD systems. These results illustrate that CNNs have a considerable advantage in fast adaptation for acoustic modeling in these settings.
  • Keywords
    audio recording; filtering theory; neural nets; speech recognition; telecommunication computing; wireless channels; CNN analysis; DNN; SAD model; audio recording; convolutional neural network analysis; data collection; deep neural network; mismatched acoustic condition; noisy radio communication channel; radio channel; speech activity detection; speech recognition; two dimensional filter; Acoustics; Adaptation models; Feature extraction; Hidden Markov models; Neural networks; Speech; Speech recognition; Convolutional neural networks; Neural network adaptation; Speech activity detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854054
  • Filename
    6854054