• DocumentCode
    57064
  • Title

    Discovering Video Shot Categories by Unsupervised Stochastic Graph Partition

  • Author

    Duan, Xiaohua ; Lin, Liang ; Chao, Hongyang

  • Author_Institution
    Sun Yat-Sen Univ., Guangzhou, China
  • Volume
    15
  • Issue
    1
  • fYear
    2013
  • fDate
    Jan. 2013
  • Firstpage
    167
  • Lastpage
    180
  • Abstract
    Video shots are often treated as the basic elements for retrieving information from videos. In recent years, video shot categorization has received increasing attention, but most of the methods involve a procedure of supervised learning, i.e., training a multi-class predictor (classifier) on the labeled data. In this paper, we study a general framework to unsupervisedly discover video shot categories. The contributions are three-fold in feature, representation, and inference: (1) A new feature is proposed to capture local information in videos, defined with small video patches (e.g., 11 × 11 × 5 pixels). A dictionary of video words can be thus clustered off-line, characterizing both appearance and motion dynamics. (2) We pose the problem of categorization as an automated graph partition task, in that each graph vertex represents a video shot, and a partitioned sub-graph consisting of connected graph vertices represents a clustered category. The model of each video shot category can be analytically calculated by a projection pursuit type of learning process. (3) An MCMC-based cluster sampling algorithm, namely Swendsen-Wang cuts, is adopted to efficiently solve the graph partition. Unlike traditional graph partition techniques, this algorithm is able to explore the nearly global optimal solution and eliminate the need for good initialization. We apply our method on a wide variety of 1600 video shots collected from Internet as well as a subset of TRECVID 2010 data, and two benchmark metrics, i.e., Purity and Conditional Entropy, are adopted for evaluating performance. The experimental results demonstrate superior performance of our method over other popular state-of-the-art methods.
  • Keywords
    information retrieval; learning (artificial intelligence); stochastic processes; video retrieval; video signal processing; MCMC-based cluster sampling algorithm; Swendsen-Wang cuts; TRECVID 2010 data; automated graph partition task; benchmark metrics; conditional entropy; connected graph vertices; graph vertex; local information; motion dynamics; multiclass predictor; nearly global optimal solution; purity entropy; supervised learning; traditional graph partition techniques; unsupervised stochastic graph partition; video information retrieval; video patches; video shot categorization; video shot category discovery; video words; Clustering algorithms; Dictionaries; Dynamics; Image color analysis; Manifolds; Stochastic processes; Vectors; Category discovery; graph partition; unsupervised categorization; video shot;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2012.2225029
  • Filename
    6331538