• DocumentCode
    730699
  • Title

    Exemplar-based speech enhancement for deep neural network based automatic speech recognition

  • Author

    Baby, Deepak ; Gemmeke, Jort F. ; Virtanen, Tuomas ; Van hamme, Hugo

  • Author_Institution
    Dept. ESAT, KU Leuven, Leuven, Belgium
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    4485
  • Lastpage
    4489
  • Abstract
    Deep neural network (DNN) based acoustic modelling has been successfully used for a variety of automatic speech recognition (ASR) tasks, thanks to its ability to learn higher-level information using multiple hidden layers. This paper investigates the recently proposed exemplar-based speech enhancement technique using coupled dictionaries as a pre-processing stage for DNN-based systems. In this setting, the noisy speech is decomposed as a weighted sum of atoms in an input dictionary containing exemplars sampled from a domain of choice, and the resulting weights are applied to a coupled output dictionary containing exemplars sampled in the short-time Fourier transform (STFT) domain to directly obtain the speech and noise estimates for speech enhancement. In this work, settings using input dictionary of exemplars sampled from the STFT, Mel-integrated magnitude STFT and modulation envelope spectra are evaluated. Experiments performed on the AURORA-4 database revealed that these pre-processing stages can improve the performance of the DNN-HMM-based ASR systems with both clean and multi-condition training.
  • Keywords
    Fourier transforms; hidden Markov models; learning (artificial intelligence); signal denoising; speech enhancement; speech recognition; AURORA-4 database; DNN-HMM-based ASR systems; DNN-based systems; coupled output dictionary; deep neural network based acoustic modelling; deep neural network based automatic speech recognition; exemplar-based speech enhancement technique; mel-integrated magnitude STFT; modulation envelope spectra; multicondition training; multiple hidden layers; noisy speech decomposition; preprocessing stage; short-time Fourier transform domain; weighted sum-of-atoms; Computational modeling; Neural networks; Speech; Speech recognition; Testing; Training; coupled dictionaries; deep neural networks; modulation envelope; non-negative matrix factorisation; speech enhancement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178819
  • Filename
    7178819