Object Recognition and Auto-annotation In News Videos

Author

Bastan; Duygulu

Author_Institution

Bilgisayar Mü

fYear

2006

fDate

6/28/1905 12:00:00 AM

Firstpage

Lastpage

Abstract

We propose a new approach to object recognition problem motivated by the availability of large annotated image and video collections. Similar to translation from one language to another, this approach considers the object recognition problem as the translation of visual elements to words. The visual elements represented in feature space are first categorized into a finite set of blobs. Then, the correspondences between the blobs and the words are learned using a method adapted from statistical machine translation. Finally, the correspondences, in the form of a probability table, are used to predict words for particular image regions (region naming), for entire images (auto-annotation), or to associate the automatically generated speech transcript text with the correct video frames (video alignment). Experimental results are presented on TRECVID 2004 data set, which consists of about 150 hours of news videos associated with manual annotations and speech transcript text.

Keywords

"Object recognition","Videos","Speech","Electrostatic precipitators","Probability","NIST"

Publisher

ieee

Conference_Titel

Signal Processing and Communications Applications, 2006 IEEE 14th

ISSN

2165-0608

Print_ISBN

1-4244-0238-7

Type

conf

DOI

10.1109/SIU.2006.1659821

Filename

1659821

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3622297