Generating natural language description of human behavior from video images

Author

Kojima, Atsuhiro ; Izumi, Masao ; Tamura, Takeshi ; Fukunaga, Kunio

Author_Institution

Libr. & Sci. Inf. Center, Osaka Prefecture Univ., Japan

Volume

4

fYear

2000

fDate

2000

Firstpage

728

Abstract

In visual surveillance applications, it is becoming popular to perceive video images and to interpret them using natural language concepts. We propose an approach to generating a natural language description of human behavior appearing in real video images. First, a head region of a human, on behalf of the whole body, is extracted from each frame. Using a model based method, three dimensional pose and position of the head are estimated. Next, the trajectory of these parameters is divided into segments of monotonous motions. For each segment, we evaluate conceptual features such as degree of change of pose and position and that of relative distance to some objects in the surroundings, and so on. By calculating the product of these feature values, a most suitable verb is selected and other syntactic elements are supplied. Finally natural language text is generated using a technique of machine translation

Keywords

image motion analysis; language translation; natural languages; surveillance; conceptual features; head region; human behavior; machine translation; model based method; monotonous motions; natural language description; natural language text; pose estimation; position estimation; syntactic elements; verb; video image; visual surveillance; AC generators; Biological system modeling; Humans; Image edge detection; Image segmentation; Layout; Magnetic heads; Natural languages; Surveillance; Vehicles;

fLanguage

English

Publisher

ieee

Conference_Titel

Pattern Recognition, 2000. Proceedings. 15th International Conference on

Conference_Location

Barcelona

ISSN

1051-4651

Print_ISBN

0-7695-0750-6

Type

conf

DOI

10.1109/ICPR.2000.903020

Filename

903020