Generating coherent natural language annotations for video streams

Author

Khan, Muhammad Usman Ghani ; Lei Zhang ; Gotoh, Yusuke

Author_Institution

Univ. of Sheffield, Sheffield, UK

fYear

2012

fDate

Sept. 30 2012-Oct. 3 2012

Firstpage

2893

Lastpage

2896

Abstract

This contribution addresses generation of natural language annotations for human actions, behaviour and their interactions with other objects observed in video streams. The work starts with implementation of conventional image processing techniques to extract high level features for individual frames. Natural language description of the frame contents is produced based on high level features. Although feature extraction processes are erroneous at various levels, we explore approaches to put them together to produce a coherent description. For extending the approach to description of video streams, units of features are introduced to present coherent, smooth and well phrased descriptions by incorporating spatial and temporal information. Evaluation is made by calculating ROUGE scores between human annotated and machine generated descriptions.

Keywords

feature extraction; natural language processing; video signal processing; video streaming; ROUGE scores; feature extraction; human action; human annotated description; image processing; machine generated description; natural language annotation; natural language description; video stream; Feature extraction; Humans; Legged locomotion; Natural languages; Streaming media; Video sequences; Visualization; Natural language description; Video annotation; Video processing; video feature units;

fLanguage

English

Publisher

ieee

Conference_Titel

Image Processing (ICIP), 2012 19th IEEE International Conference on

Conference_Location

Orlando, FL

ISSN

1522-4880

Print_ISBN

978-1-4673-2534-9

Electronic_ISBN

1522-4880

Type

conf

DOI

10.1109/ICIP.2012.6467504

Filename

6467504