DocumentCode
3161940
Title
MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments
Author
Suzuki, Masayuki ; Yoshioka, Takuya ; Watanabe, Shinji ; Minematsu, Nobuaki ; Hirose, Keikichi
Author_Institution
Univ. of Tokyo, Tokyo, Japan
fYear
2012
fDate
25-30 March 2012
Firstpage
4109
Lastpage
4112
Abstract
One of the most effective approaches to noise robust speech recognition is to remove the noise effect directly from corrupted MFCC vectors. However, VTS enhancement, which is a typical method for performing MFCC enhancement, provides limited improvement when the noise is highly non-stationary. This is because the VTS enhancement method cannot use a time-varying noise model to keep the computational cost at an acceptable level. This paper proposes a method that can enhance MFCC vectors and their dynamic parameters by using noise estimates that change on a frame-by-frame basis at a practical computational cost. The proposed method employs stereo data-based feature mapping like the well known SPLICE algorithm. The novelty of the proposed method lies in that it uses the joint space spanned by a concatenated vector of corrupted and noise features. It is also proposed to use linear discriminant analysis to effectively reduce the dimensionality of the joint space. The proposed method achieves 19.1% and 8.3% relative error reduction from the SPLICE and noise-mean normalized SPLICE algorithms, respectively.
Keywords
approximation theory; cepstral analysis; speech recognition; MFCC vector enhancement; VTS enhancement method; computational cost; corrupted concatenated vector; highly nonstationary noise environments; joint corrupted space; linear discriminant analysis; mel frequency ceptral coefficients; noise feature concatenated vector; noise feature space; noise robust speech recognition; noise-mean normalized SPLICE algorithms; stereo data-based feature mapping; time-varying noise model; vector Taylor series approximation-based algorithms; Accuracy; Joints; Mel frequency cepstral coefficient; Noise; Speech; Speech recognition; Vectors; Noise robust ASR; SPLICE; non-stationary noise;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location
Kyoto
ISSN
1520-6149
Print_ISBN
978-1-4673-0045-2
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2012.6288822
Filename
6288822
Link To Document