DocumentCode :
48683
Title :
Video-Aided Model-Based Source Separation in Real Reverberant Rooms
Author :
Khan, M.S. ; Naqvi, Syed Mohsen ; Ur-Rehman, Ata ; Wenwu Wang ; Chambers, Jonathon
Author_Institution :
Adv. Signal Process. Group, Loughborough Univ., Loughborough, UK
Volume :
21
Issue :
9
fYear :
2013
fDate :
Sept. 2013
Firstpage :
1900
Lastpage :
1912
Abstract :
Source separation algorithms that utilize only audio data can perform poorly if multiple sources or reverberation are present. In this paper we therefore propose a video-aided model-based source separation algorithm for a two-channel reverberant recording in which the sources are assumed static. By exploiting cues from video, we first localize individual speech sources in the enclosure and then estimate their directions. The interaural spatial cues, the interaural phase difference and the interaural level difference, as well as the mixing vectors are probabilistically modeled. The models make use of the source direction information and are evaluated at discrete time-frequency points. The model parameters are refined with the well-known expectation-maximization (EM) algorithm. The algorithm outputs time-frequency masks that are used to reconstruct the individual sources. Simulation results show that by utilizing the visual modality the proposed algorithm can produce better time-frequency masks thereby giving improved source estimates. We provide experimental results to test the proposed algorithm in different scenarios and provide comparisons with both other audio-only and audio-visual algorithms and achieve improved performance both on synthetic and real data. We also include dereverberation based pre-processing in our algorithm in order to suppress the late reverberant components from the observed stereo mixture and further enhance the overall output of the algorithm. This advantage makes our algorithm a suitable candidate for use in under-determined highly reverberant settings where the performance of other audio-only and audio-visual methods is limited.
Keywords :
audio signal processing; blind source separation; expectation-maximisation algorithm; image enhancement; speech intelligibility; video signal processing; EM algorithm; a two-channel reverberant recording; audio data; audio-only algorithms; audio-visual algorithms; dereverberation based preprocessing; direction estimation; discrete time-frequency points; expectation-maximization algorithm; individual source reconstruction; interaural level difference; interaural phase difference; interaural spatial cues; localize individual speech sources; mixing vectors; real reverberant rooms; reverberant component suppression; source direction information; stereo mixture; time-frequency masks; under-determined highly reverberant settings; video-aided model-based source separation; visual modality; Expectation-maximization; reverberation; source separation; spatial cues; time-frequency masking;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2013.2261814
Filename :
6514058
Link To Document :
بازگشت