DocumentCode
254100
Title
Towards Good Practices for Action Video Encoding
Author
Jianxin Wu ; Yu Zhang ; Weiyao Lin
Author_Institution
Nat. Key Lab. for Novel Software Technol., Nanjing Univ., Nanjing, China
fYear
2014
fDate
23-28 June 2014
Firstpage
2577
Lastpage
2584
Abstract
High dimensional representations such as VLAD or FV have shown excellent accuracy in action recognition. This paper shows that a proper encoding built upon VLAD can achieve further accuracy boost with only negligible computational cost. We empirically evaluated various VLAD improvement technologies to determine good practices in VLAD-based video encoding. Furthermore, we propose an interpretation that VLAD is a maximum entropy linear feature learning process. Combining this new perspective with observed VLAD data distribution properties, we propose a simple, lightweight, but powerful bimodal encoding method. Evaluated on 3 benchmark action recognition datasets (UCF101, HMDB51 and Youtube), the bimodal encoding improves VLAD by large margins in action recognition.
Keywords
feature extraction; image recognition; image representation; maximum entropy methods; video coding; FV encoding framework; VLAD data distribution properties; VLAD-based video encoding; action recognition; action video encoding; benchmark action recognition datasets; bimodal encoding method; fisher vector; good practices; high dimensional representations; maximum entropy linear feature learning process; Accuracy; Encoding; Feature extraction; Gaussian distribution; Principal component analysis; Vectors; YouTube;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on
Conference_Location
Columbus, OH
Type
conf
DOI
10.1109/CVPR.2014.330
Filename
6909726
Link To Document