Title :
Efficient Visual Search of Videos Cast as Text Retrieval
Author :
Sivic, Josef ; Zisserman, Andrew
Author_Institution :
Lab. d´´lnf., Ecole Normale Super., Paris
fDate :
4/1/2009 12:00:00 AM
Abstract :
We describe an approach to object retrieval which searches for and localizes all the occurrences of an object in a video, given a query image of the object. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject those that are unstable. Efficient retrieval is achieved by employing methods from statistical text retrieval, including inverted file systems, and text and document frequency weightings. This requires a visual analogy of a word which is provided here by vector quantizing the region descriptors. The final ranking also depends on the spatial layout of the regions. The result is that retrieval is immediate, returning a ranked list of shots in the manner of Google. We report results for object retrieval on the full length feature films ´Groundhog Day´, ´Casablanca´ and ´Run Lola Run´, including searches from within the movie and specified by external images downloaded from the Internet. We investigate retrieval performance with respect to different quantizations of region descriptors and compare the performance of several ranking measures.
Keywords :
object recognition; query processing; text analysis; video retrieval; document frequency weightings; frame matching; inverted file systems; object retrieval; query image; region descriptors; statistical text retrieval; vector quantizing; visual search; Image/video retrieval; Object recognition;
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
DOI :
10.1109/TPAMI.2008.111