MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments

Author

Suzuki, Masayuki ; Yoshioka, Takuya ; Watanabe, Shinji ; Minematsu, Nobuaki ; Hirose, Keikichi

Author_Institution

Univ. of Tokyo, Tokyo, Japan

fYear

2012

fDate

25-30 March 2012

Firstpage

4109

Lastpage

4112

Abstract

One of the most effective approaches to noise robust speech recognition is to remove the noise effect directly from corrupted MFCC vectors. However, VTS enhancement, which is a typical method for performing MFCC enhancement, provides limited improvement when the noise is highly non-stationary. This is because the VTS enhancement method cannot use a time-varying noise model to keep the computational cost at an acceptable level. This paper proposes a method that can enhance MFCC vectors and their dynamic parameters by using noise estimates that change on a frame-by-frame basis at a practical computational cost. The proposed method employs stereo data-based feature mapping like the well known SPLICE algorithm. The novelty of the proposed method lies in that it uses the joint space spanned by a concatenated vector of corrupted and noise features. It is also proposed to use linear discriminant analysis to effectively reduce the dimensionality of the joint space. The proposed method achieves 19.1% and 8.3% relative error reduction from the SPLICE and noise-mean normalized SPLICE algorithms, respectively.

Keywords

approximation theory; cepstral analysis; speech recognition; MFCC vector enhancement; VTS enhancement method; computational cost; corrupted concatenated vector; highly nonstationary noise environments; joint corrupted space; linear discriminant analysis; mel frequency ceptral coefficients; noise feature concatenated vector; noise feature space; noise robust speech recognition; noise-mean normalized SPLICE algorithms; stereo data-based feature mapping; time-varying noise model; vector Taylor series approximation-based algorithms; Accuracy; Joints; Mel frequency cepstral coefficient; Noise; Speech; Speech recognition; Vectors; Noise robust ASR; SPLICE; non-stationary noise;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location

Kyoto

ISSN

1520-6149

Print_ISBN

978-1-4673-0045-2

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2012.6288822

Filename

6288822