Robust Multifactor Speech Feature Extraction Based on Gabor Analysis

Author

Wu, Qiang ; Zhang, Liqing ; Shi, Guangchuan

Author_Institution

Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China

Volume

19

Issue

4

fYear

2011

fDate

5/1/2011 12:00:00 AM

Firstpage

927

Lastpage

936

Abstract

The performance of speech recognition systems relies on the consistency and adaptation of the speech feature in complex conditions during the training and testing stages. Traditional systems usually perform poorly under adverse noisy conditions and are not applicable to most real world problems. In this paper, we investigate the speech feature extraction problem in a noisy environment and propose a novel approach based on Gabor filtering and tensor factorization. Recent physiological and psychoacoustic experimental results suggest that the localized spectro-temporal features are essential for auditory perception. To explore this property, we represent the speech signal by using a general higher order tensor and employ two-dimensional Gabor functions with different scales and directions to analyze the localized patches of the power spectrogram. Then the Nonnegative Tensor PCA with sparse constraints is proposed to learn the projection matrices from multiple interrelated feature subspaces. The objective of the sparse constraints is to preserve the statistical characteristic of clean speech data by finding projection matrices of speech subspaces and reduce the noise components which have distributions different from those of clean speech. A multifactor analysis method is proposed to extract robust sparse features by processing the data samples in tensor structure. The simulation results indicate that our proposed method is able to improve the speech recognition performance, especially in noisy environments, compared with the traditional speech feature extraction methods.

Keywords

Gabor filters; feature extraction; hearing; speech recognition; tensors; Gabor filtering; auditory perception; noisy environment; nonnegative tensor PCA; power spectrogram; robust multifactor speech feature extraction; spectro-temporal feature; speech recognition system; speech signal; tensor factorization; two-dimensional Gabor function; Acoustic noise; Gabor filtering; auditory perception; feature extraction; speech recognition; tensor factorization;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2010.2070495

Filename

5557762