DocumentCode
11607
Title
Speech Analysis With the Strong Uncorrelating Transform
Author
Okopal, Greg ; Wisdom, Scott ; Atlas, Les
Author_Institution
Appl. Phys. Lab., Univ. of Washington, Seattle, WA, USA
Volume
23
Issue
11
fYear
2015
fDate
Nov. 2015
Firstpage
1858
Lastpage
1868
Abstract
The strong uncorrelating transform (SUT) provides estimates of independent components from linear mixtures using only second-order information, provided that the components have unique circularity coefficients. We propose a processing framework for generating complex-valued subbands from real-valued mixtures of speech and noise where the objective is to control the likely values of the sample circularity coefficients of the underlying speech and noise components in each subband. We show how several processing parameters affect the noncircularity of speech-like and noise components in the subband, ultimately informing parameter choices that allow for estimation of each of the components in a subband using the SUT. Additionally, because the speech and noise components will have unique sample circularity coefficients, this statistic can be used to identify time-frequency regions that contain voiced speech. We give an example of the recovery of the circularity coefficients of a real speech signal from a two-channel noisy mixture at -25 dB SNR, which demonstrates how the estimates of noncircularity can reveal the time-frequency structure of a speech signal in very high levels of noise. Finally, we present the results of a voice activity detection (VAD) experiment showing that two new circularity-based statistics, one of which is derived from the SUT processing, can achieve improved performance over state-of-the-art VADs in real-world recordings of noise.
Keywords
Fourier transforms; speech processing; time-frequency analysis; SUT; circularity-based statistics; complex-valued subband generation; linear mixture; second-order information; speech analysis; strong uncorrelating transform; time-frequency region; voice activity detection; Demodulation; Random variables; Signal to noise ratio; Speech; Speech processing; Transforms; Speech processing; circularity coefficients; improper; noncircularity; short-time Fourier transform (STFT); voice activity detection (VAD);
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher
ieee
ISSN
2329-9290
Type
jour
DOI
10.1109/TASLP.2015.2456426
Filename
7156090
Link To Document