Title :
A Matrix-Based Approach to Unsupervised Human Action Categorization
Author :
Cui, Peng ; Wang, Fei ; Sun, Li-Feng ; Zhang, Jian-Wei ; Yang, Shi-Qiang
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
Abstract :
Human action, as the basic unit of most human-relevant video content, bridges the gap between low-level visual features and high-level semantics. Human action recognition is of great significance in the applications of human-computer interaction, intelligent video surveillance, video retrieval and search. In this paper, we propose a novel unsupervised approach to mining categories from action video sequences, which consists of two modules: action representation for video data structurization and learning model for unsupervised categorization. In action representation, a novel view of video decomposition is presented. Videos are regarded as spatially distributed dynamic pixel time series, and these dynamic pixels are first quantized into pixel prototypes. After replacing the pixel time series with their corresponding prototype labels, the video sequences are compressed into two-dimensional action matrices. In the learning model, we put these matrices together to form an multi-action tensor, and propose the joint matrix factorization method to simultaneously cluster the pixel prototypes into pixel signatures, and matrices into action classes with the consideration of the duality between pixel clustering and action clustering. The approach is tested on public and popular Weizmann, and KTH datasets, and promising results are achieved.
Keywords :
category theory; data mining; feature extraction; human computer interaction; image motion analysis; image sequences; learning (artificial intelligence); matrix decomposition; pattern clustering; quantisation (signal); signal classification; signal representation; source separation; tensors; time series; video signal processing; 2D action matrix; action clustering; action representation; action video sequences; category mining; dynamic pixel quantization; high-level semantics; human action recognition; human-computer interaction; human-relevant video content; intelligent video surveillance; joint matrix factorization method; learning model; low-level visual features; matrix-based approach; multiaction tensor; pixel prototype clustering; pixel signature; spatially distributed dynamic pixel time series; unsupervised human action categorization; video data structurization; video decomposition; video retrieval; video search; video sequence compression; Discrete Fourier transforms; Feature extraction; Prototypes; Semantics; Tensile stress; Time series analysis; Video sequences; Action categorization; joint matrix factorization; tensor representation; video analysis;
Journal_Title :
Multimedia, IEEE Transactions on
DOI :
10.1109/TMM.2011.2176110