• DocumentCode
    3530381
  • Title

    Affine invariant features and their application to speech recognition

  • Author

    Qiao, Yu ; Suzuki, Masayuki ; Minematsu, Nobuaki

  • Author_Institution
    Grad. Sch. of Eng., Univ. of Tokyo, Tokyo
  • fYear
    2009
  • fDate
    19-24 April 2009
  • Firstpage
    4629
  • Lastpage
    4632
  • Abstract
    This paper proposes a set of affine invariant features (AIFs) for sequence data. The proposed AIFs can be calculated directly from the sequence data, and their invariance to affine transformation is proved mathematically through algebraic calculation. We apply the AIFs to speech recognition. Since the vocal tract length (VTL) difference causes to frequency warping which can be approximated well by affine transform on cepstral features, the AIFs of cepstral sequence provide robust features for VTL variations. We experimentally examine the invariance of AIFs of speech signals, and apply AIFs for Japanese isolated word recognition. The experimental results show that the combination of AIFs with MFCC or MFCC+Delta can lead to higher recognition rates than MFCC or MFCC+Delta only. Especially in the mismatched experiments, the combination with AIFs can reduce the error rates about 30% when compared to MFCC or MFCC+Delta only. The AIFs are expected to have other applications than speech recognition, since their invariance is general.
  • Keywords
    algebra; cepstral analysis; speech recognition; transforms; Japanese isolated word recognition; MFCC+Delta; affine invariant features; affine transformation; algebraic calculation; cepstral features; frequency warping; sequence data; speech recognition; vocal tract length; Cepstral analysis; Data engineering; Error analysis; Loudspeakers; Mel frequency cepstral coefficient; Pattern recognition; Robustness; Speech processing; Speech recognition; Vectors; Affine invariant feature; frequency warping; speaker normalization; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
  • Conference_Location
    Taipei
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-2353-8
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2009.4960662
  • Filename
    4960662