• DocumentCode
    1702410
  • Title

    Multimodal and Multi-task Audio-Visual Vehicle Detection and Classification

  • Author

    Wang, Tao ; Zhu, Zhigang

  • fYear
    2012
  • Firstpage
    440
  • Lastpage
    446
  • Abstract
    Moving vehicle detection and classification using multimodal data is a challenging task in data collection, audio-visual alignment, and feature selection, and effective vehicle classification in uncontrolled environments. In this work, we first present a systematic way to align the multimodal data based the multimodal temporal panorama generation. Then various types of features are extracted to represent diverse and multimodal information. Those include global geometric features (aspect ratios, profiles), local structure features (HOGs), various audio features in both spectral and perceptual representations. A flexible sequential forward selection algorithm with multi-branch searching is used to select a set of important features at different levels of feature combinations. Finally, using the same datasets for two different classification tasks, we show that the roles of audio and visual features are task-specific. Furthermore, in both cases, the combination of some of the features with multimodal and complementary information can improve the accuracy than using the individual features only. Therefore finer and more accurate classification can be achieved by two different levels of integration: feature level and the decision level.
  • Keywords
    audio signal processing; feature extraction; image classification; object detection; spectral analysis; traffic engineering computing; HOG; aspect ratios; audio features; audio-visual alignment; data collection; decision level; feature extraction; feature level; feature selection; flexible sequential forward selection algorithm; global geometric features; local structure features; moving vehicle classification; multibranch searching; multimodal audio-visual vehicle detection; multimodal temporal panorama generation; multitask audio-visual vehicle detection; perceptual representations; profiles; spectral representations; Accuracy; Feature extraction; Image reconstruction; Mel frequency cepstral coefficient; Vehicle detection; Vehicles; Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Video and Signal-Based Surveillance (AVSS), 2012 IEEE Ninth International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4673-2499-1
  • Type

    conf

  • DOI
    10.1109/AVSS.2012.47
  • Filename
    6328054