• DocumentCode
    3161940
  • Title

    MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments

  • Author

    Suzuki, Masayuki ; Yoshioka, Takuya ; Watanabe, Shinji ; Minematsu, Nobuaki ; Hirose, Keikichi

  • Author_Institution
    Univ. of Tokyo, Tokyo, Japan
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    4109
  • Lastpage
    4112
  • Abstract
    One of the most effective approaches to noise robust speech recognition is to remove the noise effect directly from corrupted MFCC vectors. However, VTS enhancement, which is a typical method for performing MFCC enhancement, provides limited improvement when the noise is highly non-stationary. This is because the VTS enhancement method cannot use a time-varying noise model to keep the computational cost at an acceptable level. This paper proposes a method that can enhance MFCC vectors and their dynamic parameters by using noise estimates that change on a frame-by-frame basis at a practical computational cost. The proposed method employs stereo data-based feature mapping like the well known SPLICE algorithm. The novelty of the proposed method lies in that it uses the joint space spanned by a concatenated vector of corrupted and noise features. It is also proposed to use linear discriminant analysis to effectively reduce the dimensionality of the joint space. The proposed method achieves 19.1% and 8.3% relative error reduction from the SPLICE and noise-mean normalized SPLICE algorithms, respectively.
  • Keywords
    approximation theory; cepstral analysis; speech recognition; MFCC vector enhancement; VTS enhancement method; computational cost; corrupted concatenated vector; highly nonstationary noise environments; joint corrupted space; linear discriminant analysis; mel frequency ceptral coefficients; noise feature concatenated vector; noise feature space; noise robust speech recognition; noise-mean normalized SPLICE algorithms; stereo data-based feature mapping; time-varying noise model; vector Taylor series approximation-based algorithms; Accuracy; Joints; Mel frequency cepstral coefficient; Noise; Speech; Speech recognition; Vectors; Noise robust ASR; SPLICE; non-stationary noise;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6288822
  • Filename
    6288822