Title :
A 3-D Audio-Visual Corpus of Affective Communication
Author :
Fanelli, Gabriele ; Gall, Juergen ; Romsdorfer, Harald ; Weise, Thibaut ; Van Gool, Luc
Author_Institution :
Comput. Vision Lab., ETH Zurich, Zurich, Switzerland
Abstract :
Communication between humans deeply relies on the capability of expressing and recognizing feelings. For this reason, research on human-machine interaction needs to focus on the recognition and simulation of emotional states, prerequisite of which is the collection of affective corpora. Currently available datasets still represent a bottleneck for the difficulties arising during the acquisition and labeling of affective data. In this work, we present a new audio-visual corpus for possibly the two most important modalities used by humans to communicate their emotional states, namely speech and facial expression in the form of dense dynamic 3-D face geometries. We acquire high-quality data by working in a controlled environment and resort to video clips to induce affective states. The annotation of the speech signal includes: transcription of the corpus text into the phonological representation, accurate phone segmentation, fundamental frequency extraction, and signal intensity estimation of the speech signals. We employ a real-time 3-D scanner to acquire dense dynamic facial geometries and track the faces throughout the sequences, achieving full spatial and temporal correspondences. The corpus is a valuable tool for applications like affective visual speech synthesis or view-independent facial expression recognition.
Keywords :
face recognition; geometry; speech processing; speech synthesis; video signal processing; 3-D audio-visual corpus; 3-D scanner; affective visual speech synthesis; dense dynamic facial geometries; fundamental frequency extraction; human-machine interaction; phone segmentation; phonological representation; signal intensity estimation; speech expression; video clips; view-independent facial expression recognition; Correlation; Databases; Face; Feature extraction; Geometry; Speech; Visualization; 3-D face modeling; Audio-visual database; emotional speech; face tracking; visual speech modeling;
Journal_Title :
Multimedia, IEEE Transactions on
DOI :
10.1109/TMM.2010.2052239