ViTex: Video To Tex and Its Application in Aerial Video Surveillance

Author

Cheng, Hui ; Butler, Darren ; Basu, Chumki

Author_Institution

Sarnoff Corporation, 201 Washington Rd, Princeton, NJ

Volume

1

fYear

2006

fDate

17-22 June 2006

Firstpage

586

Lastpage

593

Abstract

Given the huge amount of aerial surveillance video, captured daily, an automated video understanding system is needed to extract information and to generate metadata that is easy to search, browse and summarize, and which can be readily understood by an end user. In this paper, we propose a Video-To-Text engine called ViTex that automatically generates text descriptions of the content of a video. The ViTex engine first segments an input video sequence according to pre-defined semantic classes using a Mixture-of- Expert blob segmentation algorithm. The resulting segmentation is coerced into a semantic concept graph and based on domain knowledge and a semantic concept hierarchy. Then, the initial semantic concept graph is summarized and pruned. Finally, according to the summarized semantic concept graph and its changes over time, text descriptions are automatically generated using one of the three description schemes: key-frame, key-object and key-change descriptions. We have applied the ViTex engine to aerial surveillance video and compared its performance with ground-truth text descriptions generated by humans.

Keywords

Data mining; Engines; Humans; Layout; National security; Natural languages; Research and development; Video sequences; Video surveillance; Videoconference;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on

ISSN

1063-6919

Print_ISBN

0-7695-2597-0

Type

conf

DOI

10.1109/CVPR.2006.332

Filename

1640808