مرکز منطقه ای اطلاع رساني علوم و فناوري - Predicting Speaker Head Nods and the Effects of Affective Information

DocumentCode :

1324542

Title :

Predicting Speaker Head Nods and the Effects of Affective Information

Author :

Lee, Jina ; Marsella, Stacy C.

Author_Institution :

Dept. of Comput. Sci., Univ. of Southern California, Los Angeles, CA, USA

Volume :

Issue :

fYear :

2010

Firstpage :

552

Lastpage :

562

Abstract :

During face-to-face conversation, our body is continually in motion, displaying various head, gesture, and posture movements. Based on findings describing the communicative functions served by these nonverbal behaviors, many virtual agent systems have modeled them to make the virtual agent look more effective and believable. One channel of nonverbal behaviors that has received less attention is head movements, despite the important functions served by them. The goal for this work is to build a domain-independent model of speaker´s head movements that could be used to generate head movements for virtual agents. In this paper, we present a machine learning approach for learning models of head movements by focusing on when speaker head nods should occur, and conduct evaluation studies that compare the nods generated by this work to our previous approach of using handcrafted rules . To learn patterns of speaker head nods, we use a gesture corpus and rely on the linguistic and affective features of the utterance. We describe the feature selection process and training process for learning hidden Markov models and compare the results of the learned models under varying conditions. The results show that we can predict speaker head nods with high precision (.84) and recall (.89) rates, even without a deep representation of the surface text and that using affective information can help improve the prediction of the head nods (precision: .89, recall: .90). The evaluation study shows that the nods generated by the machine learning approach are perceived to be more natural in terms of nod timing than the nods generated by the rule-based approach.

Keywords :

gesture recognition; hidden Markov models; learning (artificial intelligence); motion compensation; multi-agent systems; pose estimation; prediction theory; affective information; domain independent model; face-to-face conversation; feature selection process; gesture corpus; learning hidden Markov models; machine learning; machine learning approach; rule based approach; speaker head nods prediction; speakers head movements; virtual agent systems; Data models; Hidden Markov models; Machine learning; Magnetic heads; Natural languages; Predictive models; Speech; Embodied conversational agents; emotion; head nods; machine learning; nonverbal behaviors; virtual agents;

fLanguage :

English

Journal_Title :

Multimedia, IEEE Transactions on

Publisher :

ieee

ISSN :

1520-9210

Type :

jour

DOI :

10.1109/TMM.2010.2051874

Filename :

5571818

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1324542