• DocumentCode
    438799
  • Title

    Visual concepts for news story tracking: analyzing and exploiting the NIST TRESVID video annotation experiment

  • Author

    Kender, John R. ; Naphade, Milind R.

  • Author_Institution
    Dept. of Comput. Sci., Columbia Univ., New York, NY, USA
  • Volume
    1
  • fYear
    2005
  • fDate
    20-25 June 2005
  • Firstpage
    1174
  • Abstract
    In the summer of 2003, using an interactive intelligent tool, over 100 researchers in video understanding annotated from the NIST TRECVID database over 62 hours of news video spanning six months of 1998. These 47K shots with 43 3 K labels from over 1000 visual concept categories comprise the largest publicly available ground truth for this domain. Our analysis of this data, combining the tools of statistical natural language processing, machine learning, and computer vision, finds significant novel statistical patterns that can be exploited for the accurate tracking of the episodes of a given news story over time, by using semantic labels that are solely visual. We find that the ground "truth" is very muddy, but by using the feature selection tool of information gain, we extract 14 reliable visual concepts with mid-frequency use; all but one are visual concepts that refer to settings, rather than actors, objects, or events. We discover that the probability of another episode of a named story to recur after a gap of d days is proportional to 1/(d + 1). We define a novel similarity measure incorporating both semantic and temporal properties between episodes i and j as: Dice(i, j)/(1 + gap(i, j)). We exploit a low-level computer vision technique, normalized cut (Laplacian eigenmaps), for clustering these episodes into stories, and in the process document a weakness of this popular technique. We use these empirical results to make specific recommendations on how better visual semantic ontologies for news stories, and how better video annotation tools, should be designed.
  • Keywords
    computer vision; eigenvalues and eigenfunctions; learning (artificial intelligence); natural languages; ontologies (artificial intelligence); publishing; statistical analysis; Laplacian eigenmaps; NIST TRECVID database; NIST TRESVID video annotation; computer vision; episode clustering; feature selection; interactive intelligent tool; machine learning; news episode tracking; news story tracking; semantic labels; statistical natural language processing; statistical patterns; video annotation tools; visual concepts; Computer vision; Data analysis; Data mining; Deductive databases; Laplace equations; Machine learning; NIST; Natural language processing; Pattern analysis; Visual databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on
  • ISSN
    1063-6919
  • Print_ISBN
    0-7695-2372-2
  • Type

    conf

  • DOI
    10.1109/CVPR.2005.371
  • Filename
    1467399