DocumentCode
164830
Title
Spectrogram patch based acoustic event detection and classification in speech overlapping conditions
Author
Espi, Miquel ; Fujimoto, Mitoshi ; Kubo, Yuji ; Nakatani, Takeshi
Author_Institution
NTT Commun. Sci. Labs., NTT Corp., Kyoto, Japan
fYear
2014
fDate
12-14 May 2014
Firstpage
117
Lastpage
121
Abstract
Speech does not always contain all the information needed to understand a conversation scene. Non-speech events can reveal aspects of the scene that speakers miss or neglect to mention, which could further support speech enhancement and recognition systems with information about the surrounding noise. This paper focuses on the task of detecting and classifying acoustic events in a conversation scene where these often overlap with speech. State-of-the-art techniques are based on derived features (e.g. MFCC, or Mel-filter banks), which have successfully parameterized speech spectrograms, but that reduce both resolution and detail when we are targeting other kinds of events. In this paper, we propose a method that learns hidden features directly from spectrogram patches, and integrates them within the deep neural network framework to detect and classify acoustic events. The result is a model that performs feature extraction and classification simultaneously. Experiments confirm that the proposed method outperforms deep neural networks with derived features as well as related work on the CHIL2007-AED task, showing that there is room for further improvement.
Keywords
feature extraction; neural nets; speech enhancement; speech recognition; CHIL2007-AED task; deep neural network framework; feature classification; feature extraction; nonspeech events; parameterized speech spectrograms; spectrogram patch based acoustic event detection; speech enhancement; speech overlapping conditions; speech recognition system; Acoustics; Conferences; Feature extraction; Hidden Markov models; Spectrogram; Speech; Training; acoustic event detection; communication scene understanding; spectrogram patch;
fLanguage
English
Publisher
ieee
Conference_Titel
Hands-free Speech Communication and Microphone Arrays (HSCMA), 2014 4th Joint Workshop on
Conference_Location
Villers-les-Nancy
Type
conf
DOI
10.1109/HSCMA.2014.6843263
Filename
6843263
Link To Document