DocumentCode :
11607
Title :
Speech Analysis With the Strong Uncorrelating Transform
Author :
Okopal, Greg ; Wisdom, Scott ; Atlas, Les
Author_Institution :
Appl. Phys. Lab., Univ. of Washington, Seattle, WA, USA
Volume :
23
Issue :
11
fYear :
2015
fDate :
Nov. 2015
Firstpage :
1858
Lastpage :
1868
Abstract :
The strong uncorrelating transform (SUT) provides estimates of independent components from linear mixtures using only second-order information, provided that the components have unique circularity coefficients. We propose a processing framework for generating complex-valued subbands from real-valued mixtures of speech and noise where the objective is to control the likely values of the sample circularity coefficients of the underlying speech and noise components in each subband. We show how several processing parameters affect the noncircularity of speech-like and noise components in the subband, ultimately informing parameter choices that allow for estimation of each of the components in a subband using the SUT. Additionally, because the speech and noise components will have unique sample circularity coefficients, this statistic can be used to identify time-frequency regions that contain voiced speech. We give an example of the recovery of the circularity coefficients of a real speech signal from a two-channel noisy mixture at -25 dB SNR, which demonstrates how the estimates of noncircularity can reveal the time-frequency structure of a speech signal in very high levels of noise. Finally, we present the results of a voice activity detection (VAD) experiment showing that two new circularity-based statistics, one of which is derived from the SUT processing, can achieve improved performance over state-of-the-art VADs in real-world recordings of noise.
Keywords :
Fourier transforms; speech processing; time-frequency analysis; SUT; circularity-based statistics; complex-valued subband generation; linear mixture; second-order information; speech analysis; strong uncorrelating transform; time-frequency region; voice activity detection; Demodulation; Random variables; Signal to noise ratio; Speech; Speech processing; Transforms; Speech processing; circularity coefficients; improper; noncircularity; short-time Fourier transform (STFT); voice activity detection (VAD);
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
2329-9290
Type :
jour
DOI :
10.1109/TASLP.2015.2456426
Filename :
7156090
Link To Document :
بازگشت