• DocumentCode
    64515
  • Title

    Spatio-Temporal Laplacian Pyramid Coding for Action Recognition

  • Author

    Ling Shao ; Xiantong Zhen ; Dacheng Tao ; Xuelong Li

  • Author_Institution
    Coll. of Electron. & Inf. Eng., Nanjing Univ. of Inf. Sci. & Technol., Nanjing, China
  • Volume
    44
  • Issue
    6
  • fYear
    2014
  • fDate
    Jun-14
  • Firstpage
    817
  • Lastpage
    827
  • Abstract
    We present a novel descriptor, called spatio-temporal Laplacian pyramid coding (STLPC), for holistic representation of human actions. In contrast to sparse representations based on detected local interest points, STLPC regards a video sequence as a whole with spatio-temporal features directly extracted from it, which prevents the loss of information in sparse representations. Through decomposing each sequence into a set of band-pass-filtered components, the proposed pyramid model localizes features residing at different scales, and therefore is able to effectively encode the motion information of actions. To make features further invariant and resistant to distortions as well as noise, a bank of 3-D Gabor filters is applied to each level of the Laplacian pyramid, followed by max pooling within filter bands and over spatio-temporal neighborhoods. Since the convolving and pooling are performed spatio-temporally, the coding model can capture structural and motion information simultaneously and provide an informative representation of actions. The proposed method achieves superb recognition rates on the KTH, the multiview IXMAS, the challenging UCF Sports, and the newly released HMDB51 datasets. It outperforms state of the art methods showing its great potential on action recognition.
  • Keywords
    Gabor filters; image motion analysis; image recognition; image representation; image sequences; video signal processing; 3-D Gabor filters; HMDB51 datasets; KTH; STLPC; UCF Sports; action recognition; band-pass-filtered components; detected local interest points; filter bands; holistic representation; human actions; max pooling; motion information; multiview IXMAS; pyramid model; sparse representations; spatio-temporal Laplacian pyramid coding; spatio-temporal features; spatio-temporal neighborhoods; video sequence; Educational institutions; Encoding; Feature extraction; Laplace equations; Tracking; Trajectory; Video sequences; Action recognition; computer vision; max pooling; spatio-temporal Laplacian pyramid;
  • fLanguage
    English
  • Journal_Title
    Cybernetics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2168-2267
  • Type

    jour

  • DOI
    10.1109/TCYB.2013.2273174
  • Filename
    6572804