• DocumentCode
    2182709
  • Title

    Sparse coding of auditory features for machine hearing in interference

  • Author

    Lyon, Richard F. ; Ponte, Jay ; Chechik, Gal

  • fYear
    2011
  • fDate
    22-27 May 2011
  • Firstpage
    5876
  • Lastpage
    5879
  • Abstract
    A key problem in using the output of an auditory model as the input to a machine-learning system in a machine-hearing application is to find a good feature-extraction layer. For systems such as PAMIR (passive-aggressive model for image retrieval) that work well with a large sparse feature vector, a conversion from auditory images to sparse features is needed. For audio-file ranking and retrieval from text queries, based on stabilized auditory images, we took a multi-scale approach, using vector quantization to choose one sparse feature in each of many overlapping regions of different scales, with the hope that in some regions the features for a sound would be stable even when other interfering sounds were present and affecting other regions. We recently extended our testing of this approach using sound mixtures, and found that the sparse-coded auditory-image features degrade less in interference than vector-quantized MFCC sparse features do. This initial success suggests that our hope of robustness in interference may indeed be realizable, via the general idea of sparse features that are localized in a domain where signal components tend to be localized or stable.
  • Keywords
    feature extraction; image retrieval; learning (artificial intelligence); vector quantisation; PAMIR; audio file ranking; feature extraction; machine hearing; machine learning system; passive-aggressive model for image retrieval; sound mixture; sparse coded auditory image feature; sparse coding; vector quantization; Encoding; Interference; Mel frequency cepstral coefficient; Sparse matrices; Testing; Training; Vector quantization; Auditory image; PAMIR; sound retrieval and ranking; sparse code;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
  • Conference_Location
    Prague
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4577-0538-0
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2011.5947698
  • Filename
    5947698