• DocumentCode
    254100
  • Title

    Towards Good Practices for Action Video Encoding

  • Author

    Jianxin Wu ; Yu Zhang ; Weiyao Lin

  • Author_Institution
    Nat. Key Lab. for Novel Software Technol., Nanjing Univ., Nanjing, China
  • fYear
    2014
  • fDate
    23-28 June 2014
  • Firstpage
    2577
  • Lastpage
    2584
  • Abstract
    High dimensional representations such as VLAD or FV have shown excellent accuracy in action recognition. This paper shows that a proper encoding built upon VLAD can achieve further accuracy boost with only negligible computational cost. We empirically evaluated various VLAD improvement technologies to determine good practices in VLAD-based video encoding. Furthermore, we propose an interpretation that VLAD is a maximum entropy linear feature learning process. Combining this new perspective with observed VLAD data distribution properties, we propose a simple, lightweight, but powerful bimodal encoding method. Evaluated on 3 benchmark action recognition datasets (UCF101, HMDB51 and Youtube), the bimodal encoding improves VLAD by large margins in action recognition.
  • Keywords
    feature extraction; image recognition; image representation; maximum entropy methods; video coding; FV encoding framework; VLAD data distribution properties; VLAD-based video encoding; action recognition; action video encoding; benchmark action recognition datasets; bimodal encoding method; fisher vector; good practices; high dimensional representations; maximum entropy linear feature learning process; Accuracy; Encoding; Feature extraction; Gaussian distribution; Principal component analysis; Vectors; YouTube;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on
  • Conference_Location
    Columbus, OH
  • Type

    conf

  • DOI
    10.1109/CVPR.2014.330
  • Filename
    6909726