DocumentCode :
718037
Title :
Learning sparse shape bases for human action recognition
Author :
Moayedi, F. ; Dashti, S.E. ; Boostani, R. ; Azimifar, Z.
Author_Institution :
Comput. Sci. & Eng. Dept., Shiraz Univ., Shiraz, Iran
fYear :
2015
fDate :
10-14 May 2015
Firstpage :
755
Lastpage :
760
Abstract :
Human action recognition from image sequences is a challenging issue in the field of computer vision. In this paper we propose a new approach based on bag of words (BoW) framework. In this way, a video is modeled as a sequence of visual words, where human pose corresponds to a “word”, followed by a codebook is obtained by a famous unsupervised feature learning method, called sparse coding. The main advantages of applying sparse coding method to our framework are intrinsic high level bases extraction while reduction vector quantization error. In sparse coding approach, due to overcompletness characteristic of basis sets, scaling these methods to high-resolution data is computationally expensive. In order to address this problem, the main contribution of this work is to apply sparse coding on a set of shape filter banks which are estimated via multi-resolution decomposition methods such as Empirical Mode Decomposition (EMD) and Principal Components Analysis (PCA). In this way, the number of bases is dependent to the size of the filter banks instead of the input image size. The projected coefficients are integrated by temporal max pooling to generate the final representation. In classification stage the linear kernel SVM operated on sparse coding statistics achieved satisfying accuracy. We evaluate our method on the KTH, Weismann and UCF-sports human action datasets. The achieved results are either comparable to, or significantly better than previous presented results on these datasets.
Keywords :
channel bank filters; computer vision; feature extraction; image classification; image motion analysis; image representation; image sequences; learning (artificial intelligence); object recognition; principal component analysis; support vector machines; video signal processing; BoW framework; EMD; KTH dataset; PCA; UCF-sports human action dataset; Weismann dataset; bag-of-words framework; computer vision; empirical mode decomposition; human action recognition; image sequences; image size; intrinsic high level bases extraction; linear kernel SVM; principal components analysis; representation generation; shape filter banks; sparse coding method; sparse shape base learning; support vector machines; temporal max pooling; unsupervised feature learning method; vector quantization error reduction; Conferences; Electrical engineering; Eigen-poses; Human action recognition; empirical mode decomposition; intrinsic mode function; principal component analysis; sparse coding; unsupervised features learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical Engineering (ICEE), 2015 23rd Iranian Conference on
Conference_Location :
Tehran
Print_ISBN :
978-1-4799-1971-0
Type :
conf
DOI :
10.1109/IranianCEE.2015.7146314
Filename :
7146314
Link To Document :
بازگشت