Author_Institution :
Dept. of Comput. Sci., Columbia Univ., New York, NY, USA
Abstract :
In the summer of 2003, using an interactive intelligent tool, over 100 researchers in video understanding annotated from the NIST TRECVID database over 62 hours of news video spanning six months of 1998. These 47K shots with 43 3 K labels from over 1000 visual concept categories comprise the largest publicly available ground truth for this domain. Our analysis of this data, combining the tools of statistical natural language processing, machine learning, and computer vision, finds significant novel statistical patterns that can be exploited for the accurate tracking of the episodes of a given news story over time, by using semantic labels that are solely visual. We find that the ground "truth" is very muddy, but by using the feature selection tool of information gain, we extract 14 reliable visual concepts with mid-frequency use; all but one are visual concepts that refer to settings, rather than actors, objects, or events. We discover that the probability of another episode of a named story to recur after a gap of d days is proportional to 1/(d + 1). We define a novel similarity measure incorporating both semantic and temporal properties between episodes i and j as: Dice(i, j)/(1 + gap(i, j)). We exploit a low-level computer vision technique, normalized cut (Laplacian eigenmaps), for clustering these episodes into stories, and in the process document a weakness of this popular technique. We use these empirical results to make specific recommendations on how better visual semantic ontologies for news stories, and how better video annotation tools, should be designed.
Keywords :
computer vision; eigenvalues and eigenfunctions; learning (artificial intelligence); natural languages; ontologies (artificial intelligence); publishing; statistical analysis; Laplacian eigenmaps; NIST TRECVID database; NIST TRESVID video annotation; computer vision; episode clustering; feature selection; interactive intelligent tool; machine learning; news episode tracking; news story tracking; semantic labels; statistical natural language processing; statistical patterns; video annotation tools; visual concepts; Computer vision; Data analysis; Data mining; Deductive databases; Laplace equations; Machine learning; NIST; Natural language processing; Pattern analysis; Visual databases;