مرکز منطقه ای اطلاع رساني علوم و فناوري - Unsupervised Alignment of News Video and Text Using Visual Patterns and Textual Concepts

DocumentCode :

1394137

Title :

Unsupervised Alignment of News Video and Text Using Visual Patterns and Textual Concepts

Author :

Yeh, Jun-Bin ; Wu, Chung-Hsien ; Chang, Sheng-Xiong

Author_Institution :

Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan

Volume :

Issue :

fYear :

2011

fDate :

4/1/2011 12:00:00 AM

Firstpage :

206

Lastpage :

215

Abstract :

A brief preview of a news video can be generated by semantically aligning the textual sentences of the anchor report, summarized by the anchor, with the visual field shots. Since accurately detecting the object in a visual shot is difficult and a textual term may generally correspond to several synonyms, the alignment of an anchor sentence with a video shot remains challenging. In this study, the temporal relation among the frames in a visual shot is characterized by a visual language model. The language model-based temporal relation is then applied to sentence-based alignment. The bag-of-word representations for the main objects in the key frames of a visual shot are firstly mapped to the visual patterns trained from the news video database. Furthermore, the textual terms in the report sentence are mapped to the textual concepts that are obtained from the HowNet knowledge base. Finally, unsupervised alignment between the textual concepts and the visual patterns in the news videos is performed using the IBM model-1. For evaluation, the visual pattern language model yields an alignment score of 0.77, exceeding that, 0.66, from the DTW method. Considering the performance for different news categories, visual pattern discovery and textual concept discovery can indeed improve the alignment performance in most news categories.

Keywords :

image representation; image sequences; information resources; knowledge based systems; object detection; text analysis; video databases; video signal processing; visual languages; HowNet knowledge base; anchor sentence; bag-of-word representations; news video database; object detection; sentence-based alignment; temporal relation; textual sentence; unsupervised news text alignment; unsupervised news video alignment; video frame; video shot; visual pattern; visual pattern language model; Preview generation; textual concept; unsupervised alignment; visual pattern;

fLanguage :

English

Journal_Title :

Multimedia, IEEE Transactions on

Publisher :

ieee

ISSN :

1520-9210

Type :

jour

DOI :

10.1109/TMM.2010.2095412

Filename :

5657260

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1394137