• DocumentCode
    3764145
  • Title

    WTA Hash-Based Multimodal Feature Fusion for 3D Human Action Recognition

  • Author

    Jun Ye;Kai Li;Kien A. Hua

  • Author_Institution
    Dept. of Electr. Eng. &
  • fYear
    2015
  • Firstpage
    184
  • Lastpage
    190
  • Abstract
    With the prevalence of the commodity depth sensors (e.g. Kinect), multimodal data including RGB stream, depth stream and audio stream have been utilized in various applications such as video games, education and health. Nevertheless, it is still very challenging to effectively fuse the features from multimodal data. In this paper, we propose a WTA (Winner-Take-All) Hash-based feature fusion algorithm and investigate its application in 3D human action recognition. Specifically, the WTA Hashing is performed to encode features from different modalities into the ordinal space. By leveraging the ordinal measures rather than using the absolute value of the original features, such feature embedding can provide a form of resilience to the scale and numerical perturbations. We propose a frame-level feature fusion algorithm and develop a WTA Hash-embedded warping algorithm to measure the similarity between two sequences. Experiments performed on three public 3D human action datasets show that the proposed fusion algorithm has achieved state-of-the-art recognition results even with the nearest neighbor search.
  • Keywords
    "Three-dimensional displays","Hamming distance","Fuses","Nearest neighbor searches","Feature extraction","Robustness","Sensors"
  • Publisher
    ieee
  • Conference_Titel
    Multimedia (ISM), 2015 IEEE International Symposium on
  • Type

    conf

  • DOI
    10.1109/ISM.2015.11
  • Filename
    7442322