DocumentCode
2183299
Title
Localization based stereo speech separation using deep networks
Author
Yu, Yang ; Wang, Wenwu ; Luo, Jian ; Feng, Pengming
Author_Institution
School of Marine Science and Technology, Northwestern Polytechnical University, Xi´an, China, 710072
fYear
2015
fDate
21-24 July 2015
Firstpage
153
Lastpage
157
Abstract
Time-frequency (T-F) masking is an effective method for stereo speech source separation. However, reliable estimation of the T-F mask from sound mixtures is a challenging task, especially when room reverberations are present in the mixtures. In this paper, we proposed a new stereo speech separation system where deep networks are used to generate soft T-F mask for separation. More specifically, the deep network, which is composed of two sparse autoencoders and a softmax classifier, is used to estimate the orientations of the target and interferers at each T-F unit, based on low-level features, such as mixing vector (MV), interaural level and phase difference (IPD/ILD). The deep network is trained by a greedy layer-wise method using a dataset that was generated by convolving room impulse responses (RIRs) with clean speech signals positioned in different angles with respect to the sensors. With the trained deep networks, the probability that each T-F unit belongs to the target or interferer can be estimated based on the localization cues for generating the soft mask. Experiments based on real binaural RIRs and TIMIT dataset are provided to show the performance of the proposed system for reverberant speech mixtures, as compared with a model based T-F masking technique proposed recently.
Keywords
Feature extraction; Neural networks; Reverberation; Source separation; Speech; Speech processing; Training; Deep learning; Deep networks; Soft mask; Source separation;
fLanguage
English
Publisher
ieee
Conference_Titel
Digital Signal Processing (DSP), 2015 IEEE International Conference on
Conference_Location
Singapore, Singapore
Type
conf
DOI
10.1109/ICDSP.2015.7251849
Filename
7251849
Link To Document