• DocumentCode
    1796897
  • Title

    Using compressed audio-visual words for multi-modal scene classification

  • Author

    Kurcius, Jan J. ; Breckon, Toby P.

  • Author_Institution
    Cranfield Univ., Cranfield, UK
  • fYear
    2014
  • fDate
    1-2 Nov. 2014
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    We present a novel approach to scene classification using combined audio signal and video image features and compare this methodology to scene classification results using each modality in isolation. Each modality is represented using summary features, namely Mel-frequency Cepstral Coefficients (audio) and Scale Invariant Feature Transform (SIFT) (video) within a multi-resolution bag-of-features model. Uniquely, we extend the classical bag-of-words approach over both audio and video feature spaces, whereby we introduce the concept of compressive sensing as a novel methodology for multi-modal fusion via audio-visual feature dimensionality reduction. We perform evaluation over a range of environments showing performance that is both comparable to the state of the art (86%, over ten scene classes) and invariant to a ten-fold dimensionality reduction within the audio-visual feature space using our compressive representation approach.
  • Keywords
    audio signal processing; cepstral analysis; image classification; image fusion; image resolution; transforms; video coding; SIFT; audio feature space; audio signal feature; audio-visual feature dimensionality reduction; audio-visual feature space; compressed audio-visual word; compressive representation; compressive sensing; mel-frequency cepstral coefficients; multimodal fusion; multimodal scene classification; multiresolution bag-of-features model; scale invariant feature transform; video feature space; video image feature; Accuracy; Compressed sensing; Feature extraction; Mel frequency cepstral coefficient; Support vector machines; Visualization; Vocabulary; MFCC; audio-visual; bag of words; compressed sensing; multi-modal; multi-resolution;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence for Multimedia Understanding (IWCIM), 2014 International Workshop on
  • Conference_Location
    Paris
  • Type

    conf

  • DOI
    10.1109/IWCIM.2014.7008808
  • Filename
    7008808