A blind separation algorithm of speech mixtures base on time-frequency masking

Author

Guo, Wei ; Zong, Qingquan

Author_Institution

Comput. Sch., Wuhan Univ., Wuhan, China

fYear

2012

fDate

21-23 April 2012

Firstpage

2258

Lastpage

2261

Abstract

Based on Technology of Time-Frequency Masking, we raise a blind separation algorithm of speech mixtures, which can be used for separating any number of source using only two mixtures. The method is valid when sources are satisfying W-disjoint orthogonal, that is, when the supports of the windowed Fourier transform of the signals in mixture are disjoint. In time-frequency domain, Performance is compared for floating-point and fixed-point implementations. A Weighted K-means clustering algorithm is presented as an alternative to gradient descent methods for peak tracking and demonstrated to achieve excellent performance without adversely affecting computational load. extract the spatial cues of speech signal, which are relative attenuation-delay pairs, then Motivated by the maximum likelihood mixing parameter estimators, we define a power weighted two-dimensional (2-D) histogram constructed from the ratio of the time-frequency representations of the mixtures that is shown to have one peak for each source with peak location corresponding to the relative attenuation and delay mixing parameters. Then, mark the time-frequency binary masking and using this technique separate the source in time-frequency domain. Finally, I-STFT is used to transform the separated source back to time domain and separated the signal. In a word, the proposed algorithm will give a new prospect to the research of blind separation of speech.

Keywords

Fourier transforms; pattern clustering; speech processing; W-disjoint orthogonal; Weighted K-means clustering algorithm; blind separation algorithm; fixed point implementations; floating point implementations; maximum likelihood mixing parameter estimators; spatial cues; speech mixtures; speech signal; time frequency domain; time frequency masking; windowed Fourier transform; Attenuation; Clustering algorithms; Delay; Histograms; Source separation; Speech; Time frequency analysis; Blind Separation; Time-Frequency Masking; W-disjoint orthogonal;

fLanguage

English

Publisher

ieee

Conference_Titel

Consumer Electronics, Communications and Networks (CECNet), 2012 2nd International Conference on

Conference_Location

Yichang

Print_ISBN

978-1-4577-1414-6

Type

conf

DOI

10.1109/CECNet.2012.6201885

Filename

6201885