DocumentCode
3672610
Title
TVSum: Summarizing web videos using titles
Author
Yale Song;Jordi Vallmitjana;Amanda Stent;Alejandro Jaimes
Author_Institution
Yahoo Labs, New York, USA
fYear
2015
fDate
6/1/2015 12:00:00 AM
Firstpage
5179
Lastpage
5187
Abstract
Video summarization is a challenging problem in part because knowing which part of a video is important requires prior knowledge about its main topic. We present TVSum, an unsupervised video summarization framework that uses title-based image search results to find visually important shots. We observe that a video title is often carefully chosen to be maximally descriptive of its main topic, and hence images related to the title can serve as a proxy for important visual concepts of the main topic. However, because titles are free-formed, unconstrained, and often written ambiguously, images searched using the title can contain noise (images irrelevant to video content) and variance (images of different topics). To deal with this challenge, we developed a novel co-archetypal analysis technique that learns canonical visual concepts shared between video and images, but not in either alone, by finding a joint-factorial representation of two data sets. We introduce a new benchmark dataset, TVSum50, that contains 50 videos and their shot-level importance scores annotated via crowdsourcing. Experimental results on two datasets, SumMe and TVSum50, suggest our approach produces superior quality summaries compared to several recently proposed approaches.
Keywords
"Videos","Yttrium","Visualization","Optimization","Approximation methods","Crowdsourcing","Focusing"
Publisher
ieee
Conference_Titel
Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on
Electronic_ISBN
1063-6919
Type
conf
DOI
10.1109/CVPR.2015.7299154
Filename
7299154
Link To Document