Title :
Supervised Latent Dirichlet Allocation Models for Efficient Activity Representation
Author :
Umakanthan, Sabanadesan ; Denman, Simon ; Fookes, Clinton ; Sridharan, Sridha
Author_Institution :
Image & Video Res. Lab., Queensland Univ. of Technol., Brisbane, QLD, Australia
Abstract :
Local spatio-temporal features with a Bag-of-visual words model is a popular approach used in human action recognition. Bag-of-features methods suffer from several challenges such as extracting appropriate appearance and motion features from videos, converting extracted features appropriate for classification and designing a suitable classification framework. In this paper we address the problem of efficiently representing the extracted features for classification to improve the overall performance. We introduce two generative supervised topic models, maximum entropy discrimination LDA (MedLDA) and class- specific simplex LDA (css-LDA), to encode the raw features suitable for discriminative SVM based classification. Unsupervised LDA models disconnect topic discovery from the classification task, hence yield poor results compared to the baseline Bag-of-words framework. On the other hand supervised LDA techniques learn the topic structure by considering the class labels and improve the recognition accuracy significantly. MedLDA maximizes likelihood and within class margins using max-margin techniques and yields a sparse highly discriminative topic structure; while in css-LDA separate class specific topics are learned instead of common set of topics across the entire dataset. In our representation first topics are learned and then each video is represented as a topic proportion vector, i.e. it can be comparable to a histogram of topics. Finally SVM classification is done on the learned topic proportion vector. We demonstrate the efficiency of the above two representation techniques through the experiments carried out in two popular datasets. Experimental results demonstrate significantly improved performance compared to the baseline Bag-of-features framework which uses kmeans to construct histogram of words from the feature vectors.
Keywords :
feature extraction; gesture recognition; image classification; image representation; statistical analysis; support vector machines; unsupervised learning; MedLDA; bag-of-visual words model; class- specific simplex LDA; css-LDA; discriminative SVM based classification; efficient activity representation; feature extraction; generative supervised topic models; human action recognition; local spatio-temporal features; max-margin techniques; maximum entropy discrimination LDA; suitable classification framework; supervised latent Dirichlet allocation models; topic proportion vector; unsupervised LDA models; Accuracy; Feature extraction; Hidden Markov models; Histograms; Support vector machines; Vectors; Vocabulary;
Conference_Titel :
Digital lmage Computing: Techniques and Applications (DlCTA), 2014 International Conference on
Conference_Location :
Wollongong, NSW
DOI :
10.1109/DICTA.2014.7008130