Title :
Effective Codebooks for human action categorization
Author :
Ballan, Lamberto ; Bertini, Marco ; Del Bimbo, Alberto ; Seidenari, Lorenzo ; Serra, Giuseppe
Author_Institution :
Media Integration & Commun. Center, Univ. of Florence, Florence, Italy
fDate :
Sept. 27 2009-Oct. 4 2009
Abstract :
In this paper we propose a new method for human action categorization by using an effective combination of novel gradient and optic flow descriptors, and creating a more effective codebook modeling the ambiguity of feature assignment in the traditional bag-of-words model. Recent approaches have represented video sequences using a bag of spatio-temporal visual words, following the successful results achieved in object and scene classification. Codebooks are usually obtained by k-means clustering and hard assignment of visual features to the best representing codeword. Our main contribution is two-fold. First, we define a new 3D gradient descriptor that combined with optic flow outperforms the state-of-the-art, without requiring fine parameter tuning. Second, we show that for spatio-temporal features the popular k-means algorithm is insufficient because cluster centers are attracted by the denser regions of the sample distribution, providing a non-uniform description of the feature space and thus failing to code other informative regions. Therefore, we apply a radius-based clustering method and a soft assignment that considers the information of two or more relevant candidates. This approach generates a more effective codebook resulting in a further improvement of classification performances. We extensively test our approach on standard KTH and Weizmann action datasets showing its validity and outperforming other recent approaches.
Keywords :
image classification; image sequences; pattern clustering; 3D gradient descriptor; KTH; Weizmann action datasets; bag-of-words model; classification performances; effective codebooks; human action categorization; k-means clustering; optic flow descriptors; radius-based clustering method; spatio temporal visual words; video sequences; Data mining; Detectors; History; Humans; Image motion analysis; Layout; Robustness; Shape control; Shape measurement; Vocabulary;
Conference_Titel :
Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4244-4442-7
Electronic_ISBN :
978-1-4244-4441-0
DOI :
10.1109/ICCVW.2009.5457658