• DocumentCode
    164830
  • Title

    Spectrogram patch based acoustic event detection and classification in speech overlapping conditions

  • Author

    Espi, Miquel ; Fujimoto, Mitoshi ; Kubo, Yuji ; Nakatani, Takeshi

  • Author_Institution
    NTT Commun. Sci. Labs., NTT Corp., Kyoto, Japan
  • fYear
    2014
  • fDate
    12-14 May 2014
  • Firstpage
    117
  • Lastpage
    121
  • Abstract
    Speech does not always contain all the information needed to understand a conversation scene. Non-speech events can reveal aspects of the scene that speakers miss or neglect to mention, which could further support speech enhancement and recognition systems with information about the surrounding noise. This paper focuses on the task of detecting and classifying acoustic events in a conversation scene where these often overlap with speech. State-of-the-art techniques are based on derived features (e.g. MFCC, or Mel-filter banks), which have successfully parameterized speech spectrograms, but that reduce both resolution and detail when we are targeting other kinds of events. In this paper, we propose a method that learns hidden features directly from spectrogram patches, and integrates them within the deep neural network framework to detect and classify acoustic events. The result is a model that performs feature extraction and classification simultaneously. Experiments confirm that the proposed method outperforms deep neural networks with derived features as well as related work on the CHIL2007-AED task, showing that there is room for further improvement.
  • Keywords
    feature extraction; neural nets; speech enhancement; speech recognition; CHIL2007-AED task; deep neural network framework; feature classification; feature extraction; nonspeech events; parameterized speech spectrograms; spectrogram patch based acoustic event detection; speech enhancement; speech overlapping conditions; speech recognition system; Acoustics; Conferences; Feature extraction; Hidden Markov models; Spectrogram; Speech; Training; acoustic event detection; communication scene understanding; spectrogram patch;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Hands-free Speech Communication and Microphone Arrays (HSCMA), 2014 4th Joint Workshop on
  • Conference_Location
    Villers-les-Nancy
  • Type

    conf

  • DOI
    10.1109/HSCMA.2014.6843263
  • Filename
    6843263