• DocumentCode
    248034
  • Title

    Sum-max video pooling for complex event recognition

  • Author

    Sang Phan ; Duy-Dinh Le ; Satoh, S.

  • Author_Institution
    Grad. Univ. for Adv. Studies (SOKENDAI), Yokosuka, Japan
  • fYear
    2014
  • fDate
    27-30 Oct. 2014
  • Firstpage
    1026
  • Lastpage
    1030
  • Abstract
    A video can be viewed as a layered structure where the lowest layer are frames, the top layer is the entire video, and the middle layers are the sequences of consecutive frames or the concatenation of lower layers. While it is easy to find local discriminative features in video from lower layers, it is non-trivial to aggregate these features into a discriminative video representation. In literature, people often use sum pooling to obtain reasonable recognition performance on artificial videos. However, the sum pooling technique does not work well on complex videos because the region of interests may reside within some middle layers. In this paper, we leverage the layered structure of video to propose a new pooling method, named sum-max video pooling, to handle this problem. Basically, we apply sum pooling at the low layer representation while using max pooling at the high layer representation. Sum pooling is used to keep sufficient relevant features at the low layer, while max pooling is used to retrieve the most relevant features at the high layer, therefore it can discard irrelevant features in the final video representation. Experimental results on the TRECVID Multimedia Event Detection 2010 dataset shows the effectiveness of our method.
  • Keywords
    image recognition; image representation; video signal processing; TRECVID Multimedia Event Detection 2010 dataset; artificial videos; complex event recognition; complex videos; discriminative video representation; feature aggregation; frame sequences; high-layer representation; layered structure; local discriminative features; low-layer representation; lower-layer concatenation; middle layers; recognition performance; region of interests; sum-max video pooling; top layer; video representation; Aggregates; Computer vision; Event detection; Feature extraction; Multimedia communication; Noise measurement; Visualization; max-pooling; multimedia event detection; sum-max video pooling; sum-pooling; video representation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Image Processing (ICIP), 2014 IEEE International Conference on
  • Conference_Location
    Paris
  • Type

    conf

  • DOI
    10.1109/ICIP.2014.7025204
  • Filename
    7025204