• DocumentCode
    2182455
  • Title

    Localization of non-linguistic events in spontaneous speech by Non-Negative Matrix Factorization and Long Short-Term Memory

  • Author

    Weninger, Felix ; Schuller, Björn ; Wöllmer, Martin ; Rigoll, Gerhard

  • Author_Institution
    Inst. for Human-Machine Commun., Tech. Univ. Munchen, Munich, Germany
  • fYear
    2011
  • fDate
    22-27 May 2011
  • Firstpage
    5840
  • Lastpage
    5843
  • Abstract
    Features generated by Non-Negative Matrix Factorization (NMF) have successfully been introduced into robust speech processing, including noise-robust speech recognition and detection of non-linguistic vocalizations. In this study, we introduce a novel tandem approach by integrating likelihood features derived from NMF into Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTM-RNNs) in order to dynamically localize non-linguistic events, i. e., laughter, vocal, and non-vocal noise, in highly spontaneous speech. We compare our tandem architecture to a baseline conventional phoneme-HMM-based speech recognizer, and achieve a relative reduction of the frame error rate by 37.5 % in the discrimination of speech and different non-speech segments.
  • Keywords
    hidden Markov models; matrix decomposition; speech recognition; BLSTM-RNN; NMF; baseline conventional phoneme-HMM-based speech recognizer; bidirectional long short-term memory recurrent neural networks; noise-robust speech recognition; nonlinguistic events; nonnegative matrix factorization; speech processing; Feature extraction; Hidden Markov models; Noise; Recurrent neural networks; Speech; Speech recognition; Training; Non-Linguistic Vocalizations; Non-Negative Matrix Factorization; Recurrent Neural Networks;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
  • Conference_Location
    Prague
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4577-0538-0
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2011.5947689
  • Filename
    5947689