DocumentCode :
248034
Title :
Sum-max video pooling for complex event recognition
Author :
Sang Phan ; Duy-Dinh Le ; Satoh, S.
Author_Institution :
Grad. Univ. for Adv. Studies (SOKENDAI), Yokosuka, Japan
fYear :
2014
fDate :
27-30 Oct. 2014
Firstpage :
1026
Lastpage :
1030
Abstract :
A video can be viewed as a layered structure where the lowest layer are frames, the top layer is the entire video, and the middle layers are the sequences of consecutive frames or the concatenation of lower layers. While it is easy to find local discriminative features in video from lower layers, it is non-trivial to aggregate these features into a discriminative video representation. In literature, people often use sum pooling to obtain reasonable recognition performance on artificial videos. However, the sum pooling technique does not work well on complex videos because the region of interests may reside within some middle layers. In this paper, we leverage the layered structure of video to propose a new pooling method, named sum-max video pooling, to handle this problem. Basically, we apply sum pooling at the low layer representation while using max pooling at the high layer representation. Sum pooling is used to keep sufficient relevant features at the low layer, while max pooling is used to retrieve the most relevant features at the high layer, therefore it can discard irrelevant features in the final video representation. Experimental results on the TRECVID Multimedia Event Detection 2010 dataset shows the effectiveness of our method.
Keywords :
image recognition; image representation; video signal processing; TRECVID Multimedia Event Detection 2010 dataset; artificial videos; complex event recognition; complex videos; discriminative video representation; feature aggregation; frame sequences; high-layer representation; layered structure; local discriminative features; low-layer representation; lower-layer concatenation; middle layers; recognition performance; region of interests; sum-max video pooling; top layer; video representation; Aggregates; Computer vision; Event detection; Feature extraction; Multimedia communication; Noise measurement; Visualization; max-pooling; multimedia event detection; sum-max video pooling; sum-pooling; video representation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Image Processing (ICIP), 2014 IEEE International Conference on
Conference_Location :
Paris
Type :
conf
DOI :
10.1109/ICIP.2014.7025204
Filename :
7025204
Link To Document :
بازگشت