مرکز منطقه ای اطلاع رساني علوم و فناوري - A 3-D Audio-Visual Corpus of Affective Communication

DocumentCode :

1324562

Title :

A 3-D Audio-Visual Corpus of Affective Communication

Author :

Fanelli, Gabriele ; Gall, Juergen ; Romsdorfer, Harald ; Weise, Thibaut ; Van Gool, Luc

Author_Institution :

Comput. Vision Lab., ETH Zurich, Zurich, Switzerland

Volume :

Issue :

fYear :

2010

Firstpage :

591

Lastpage :

598

Abstract :

Communication between humans deeply relies on the capability of expressing and recognizing feelings. For this reason, research on human-machine interaction needs to focus on the recognition and simulation of emotional states, prerequisite of which is the collection of affective corpora. Currently available datasets still represent a bottleneck for the difficulties arising during the acquisition and labeling of affective data. In this work, we present a new audio-visual corpus for possibly the two most important modalities used by humans to communicate their emotional states, namely speech and facial expression in the form of dense dynamic 3-D face geometries. We acquire high-quality data by working in a controlled environment and resort to video clips to induce affective states. The annotation of the speech signal includes: transcription of the corpus text into the phonological representation, accurate phone segmentation, fundamental frequency extraction, and signal intensity estimation of the speech signals. We employ a real-time 3-D scanner to acquire dense dynamic facial geometries and track the faces throughout the sequences, achieving full spatial and temporal correspondences. The corpus is a valuable tool for applications like affective visual speech synthesis or view-independent facial expression recognition.

Keywords :

face recognition; geometry; speech processing; speech synthesis; video signal processing; 3-D audio-visual corpus; 3-D scanner; affective visual speech synthesis; dense dynamic facial geometries; fundamental frequency extraction; human-machine interaction; phone segmentation; phonological representation; signal intensity estimation; speech expression; video clips; view-independent facial expression recognition; Correlation; Databases; Face; Feature extraction; Geometry; Speech; Visualization; 3-D face modeling; Audio-visual database; emotional speech; face tracking; visual speech modeling;

fLanguage :

English

Journal_Title :

Multimedia, IEEE Transactions on

Publisher :

ieee

ISSN :

1520-9210

Type :

jour

DOI :

10.1109/TMM.2010.2052239

Filename :

5571821

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1324562