DocumentCode
64745
Title
Art Critic: Multisignal Vision and Speech Interaction System in a Gaming Context
Author
Reale, Michael J. ; Peng Liu ; Lijun Yin ; Canavan, Shaun
Author_Institution
Dept. of Comput. Sci., State Univ. of New York at Binghamton, Binghamton, NY, USA
Volume
43
Issue
6
fYear
2013
fDate
Dec. 2013
Firstpage
1546
Lastpage
1559
Abstract
True immersion of a player within a game can only occur when the world simulated looks and behaves as close to reality as possible. This implies that the game must correctly read and understand, among other things, the player´s focus, attitude toward the objects/persons in focus, gestures, and speech. In this paper, we proposed a novel system that integrates eye gaze estimation, head pose estimation, facial expression recognition, speech recognition, and text-to-speech components for use in real-time games. Both the eye gaze and head pose components utilize underlying 3-D models, and our novel head pose estimation algorithm uniquely combines scene flow with a generic head model. The facial expression recognition module uses the local binary patterns with three orthogonal planes approach on the 2-D shape index domain rather than the pixel domain, resulting in improved classification. Our system has also been extended to use a pan-tilt-zoom camera driven by the Kinect, allowing us to track a moving player. A test game, Art Critic, is also presented, which not only demonstrates the utility of our system but also provides a template for player/non-player character (NPC) interaction in a gaming context. The player alters his/her view of the 3-D world using head pose, looks at paintings/NPCs using eye gaze, and makes an evaluation based on the player´s expression and speech. The NPC artist will respond with facial expression and synthetic speech based on its personality. Both qualitative and quantitative evaluations of the system are performed to illustrate the system´s effectiveness.
Keywords
cameras; computer games; face recognition; feature extraction; pose estimation; solid modelling; speech recognition; 2D shape index domain; 3D models; 3D world; Art Critic game; Kinect; eye gaze estimation; facial expression recognition; gaming context; generic head model; head pose estimation; local binary patterns; multisignal vision interaction system; orthogonal planes approach; pan-tilt-zoom camera; pixel domain; player attitude; player expression; player focus; player immersion; player speech; player-nonplayer character interaction; qualitative evaluation; quantitative evaluation; scene flow; speech interaction system; speech recognition; text-to-speech components; Cameras; Estimation; Face; Games; Solid modeling; Speech; Expression recognition; gaming interaction; gaze tracking; head pose estimation; speech recognition; text-to-speech;
fLanguage
English
Journal_Title
Cybernetics, IEEE Transactions on
Publisher
ieee
ISSN
2168-2267
Type
jour
DOI
10.1109/TCYB.2013.2271606
Filename
6572826
Link To Document