DocumentCode :
1484843
Title :
A Binaural Scene Analyzer for Joint Localization and Recognition of Speakers in the Presence of Interfering Noise Sources and Reverberation
Author :
May, Tobias ; van de Par, Steven ; Kohlrausch, Armin
Author_Institution :
Inst. of Phys., Univ. of Oldenburg, Oldenburg, Germany
Volume :
20
Issue :
7
fYear :
2012
Firstpage :
2016
Lastpage :
2030
Abstract :
In this study, we present a binaural scene analyzer that is able to simultaneously localize, detect and identify a known number of target speakers in the presence of spatially positioned noise sources and reverberation. In contrast to many other binaural cocktail party processors, the proposed system does not require a priori knowledge about the azimuth position of the target speakers. The proposed system consists of three main building blocks: binaural localization, speech source detection, and automatic speaker identification. First, a binaural front-end is used to robustly localize relevant sound source activity. Second, a speech detection module based on missing data classification is employed to determine whether detected sound source activity corresponds to a speaker or to an interfering noise source using a binary mask that is based on spatial evidence supplied by the binaural front-end. Third, a second missing data classifier is used to recognize the speaker identities of all detected speech sources. The proposed system is systematically evaluated in simulated adverse acoustic scenarios. Compared to state-of-the art MFCC recognizers, the proposed model achieves significant speaker recognition accuracy improvements.
Keywords :
interference; pattern classification; reverberation; speaker recognition; MFCC recognizer; adverse acoustic simulation; automatic speaker identification; binary mask; binaural cocktail party processor; binaural localization; binaural scene analyzer; joint localization; missing data classification; reverberation interference; second missing data classifier; sound source activity; spatially positioned noise source interference; speaker localization; speaker recognition; speech detection module; speech source detection; Acoustics; Auditory system; Humans; Noise; Speech; Speech recognition; Target recognition; Automatic speaker recognition; binaural processing; computational auditory scene analysis (CASA); mask estimation; missing data;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2012.2193391
Filename :
6178270
Link To Document :
بازگشت