DocumentCode :
48951
Title :
Exploring Monaural Features for Classification-Based Speech Segregation
Author :
Wang, Yuxuan ; Han, Kun ; Wang, DeLiang
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Volume :
21
Issue :
2
fYear :
2013
fDate :
Feb. 2013
Firstpage :
270
Lastpage :
279
Abstract :
Monaural speech segregation has been a very challenging problem for decades. By casting speech segregation as a binary classification problem, recent advances have been made in computational auditory scene analysis on segregation of both voiced and unvoiced speech. So far, pitch and amplitude modulation spectrogram have been used as two main kinds of time-frequency (T-F) unit level features in classification. In this paper, we expand T-F unit features to include gammatone frequency cepstral coefficients (GFCC), mel-frequency cepstral coefficients, relative spectral transform (RASTA) and perceptual linear prediction (PLP). Comprehensive comparisons are performed in order to identify effective features for classification-based speech segregation. Our experiments in matched and unmatched test conditions show that these newly included features significantly improve speech segregation performance. Specifically, GFCC and RASTA-PLP are the best single features in matched-noise and unmatched-noise test conditions, respectively. We also find that pitch-based features are crucial for good generalization to unseen environments. To further explore complementarity in terms of discriminative power, we propose to use a group Lasso approach to select complementary features in a principled way. The final combined feature set yields promising results in both matched and unmatched test conditions.
Keywords :
amplitude modulation; cepstral analysis; speech processing; time-frequency analysis; GFCC; PLP; RASTA; T-F unit level feature; amplitude modulation spectrogram; binary classification problem; classification-based speech segregation; computational auditory scene analysis; gammatone frequency cepstral coefficient; group Lasso approach; matched testing; mel-frequency cepstral coefficient; monaural feature exploration; monaural speech segregation; perceptual linear prediction; pitch modulation spectrogram; relative spectral transform; time-frequency unit level feature; unmatched-noise test condition; unvoiced speech; voiced speech; Feature extraction; Mel frequency cepstral coefficient; Signal to noise ratio; Speech; Training; Binary classification; computational auditory scene analysis (CASA); feature combination; group Lasso; monaural speech segregation;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2012.2221459
Filename :
6317144
Link To Document :
بازگشت