• DocumentCode
    2800541
  • Title

    Selecting static and dynamic features using an advanced auditory model for speech recognition

  • Author

    Koniaris, Christos ; Chatterjee, Saikat ; Kleijn, W. Bastiaan

  • Author_Institution
    Sound & Image Process. Lab., KTH - R. Inst. of Technol., Stockholm, Sweden
  • fYear
    2010
  • fDate
    14-19 March 2010
  • Firstpage
    4342
  • Lastpage
    4345
  • Abstract
    We describe a method to select features for speech recognition that is based on a quantitative model of the human auditory periphery. The method maximizes the similarity of the geometry of the space spanned by the subset of features and the geometry of the space spanned by the auditory model output. The selection method uses a spectro-temporal auditory model that captures both frequency- and time-domain masking. The selection method is blind to the meaning of speech and does not require annotated speech data. We apply the method to the selection of a subset of features from a conventional set consisting of mel cepstra and their first-order and second-order time derivatives. Although our method uses only knowledge of the human auditory periphery, the experimental results show that it performs significantly better than feature-reduction algorithms based on linear and heteroscedastic discriminant analysis that require training with annotated speech data.
  • Keywords
    cepstral analysis; feature extraction; speech intelligibility; speech recognition; dynamic feature; feature selection; first-order time derivatives; frequency-domain masking; geometry; human auditory periphery; mel cepstra; quantitative model; second-order time derivatives; spectro-temporal auditory model; speech recognition; static feature; time-domain masking; Algorithm design and analysis; Cepstral analysis; Frequency; Geometry; Humans; Performance analysis; Solid modeling; Speech analysis; Speech recognition; Time domain analysis; auditory model; dimension reduction; distortion; feature selection; perception; sensitivity analysis; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-4295-9
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2010.5495648
  • Filename
    5495648