• DocumentCode
    3164978
  • Title

    Non-negative matrix factorization for highly noise-robust ASR: To enhance or to recognize?

  • Author

    Weninger, Felix ; Wöllmer, Martin ; Geiger, Jürgen ; Schuller, Björn ; Gemmeke, Jort F. ; Hurmalainen, Antti ; Virtanen, Tuomas ; Rigoll, Gerhard

  • Author_Institution
    Inst. for Human-Machine Commun., Tech. Univ. Munchen, München, Germany
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    4681
  • Lastpage
    4684
  • Abstract
    This paper proposes a multi-stream speech recognition system that combines information from three complementary analysis methods in order to improve automatic speech recognition in highly noisy and reverberant environments, as featured in the 2011 PASCAL CHiME Challenge. We integrate word predictions by a bidirectional Long Short-Term Memory recurrent neural network and non-negative sparse classification (NSC) into a multi-stream Hidden Markov Model using convolutive non-negative matrix factorization (NMF) for speech enhancement. Our results suggest that NMF-based enhancement and NSC are complementary despite their overlap in methodology, reaching up to 91.9% average keyword accuracy on the Challenge test set at signal-to-noise ratios from -6 to 9 dB-the best result reported so far on these data.
  • Keywords
    hidden Markov models; matrix decomposition; recurrent neural nets; speech enhancement; speech recognition; automatic speech recognition; average keyword accuracy; bidirectional long short term memory recurrent neural network; challenge test set; complementary analysis method; convolutive nonnegative matrix factorization; multistream hidden Markov model; multistream speech recognition; noise robust ASR; nonnegative sparse classification; signal-to-noise ratio; speech enhancement; word predictions; Hidden Markov models; Mel frequency cepstral coefficient; Noise; Speech; Speech enhancement; Speech recognition; Training; Non-Negative Matrix Factorization; Tandem Speech Recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6288963
  • Filename
    6288963