Title :
A Comparative Study of Audio Features for Audio-to-Visual Conversion in Mpeg-4 Compliant Facial Animation
Author :
Xie, Lei ; Liu, Zhi-Qiang
Author_Institution :
Sch. of Creative Media, City Univ. of Hong Kong, Kowloon
Abstract :
Audio-to-visual conversion is the basic problem of speech-driven facial animation. Since the conversion problem is to predict facial control parameters from the acoustic speech, the informative representation of audio, i.e., the audio feature, is important to get a good prediction. This paper presents a performance comparison on prosodic features, articulatory features, and perceptual features for the audio-to-visual conversion problem on a common test bed. Experimental results show that the Mel frequency cepstral coefficients (MFCCs) produce the best performance, followed by the perceptual linear prediction coefficients (PLPC), the linear predictive cepstral coefficients (LPCCs), and the prosodic feature set (F0 and energy). The combination of three kinds of features can further improve the prediction performance on facial parameters. It unveils that different audio features carry complementary information relevant to facial animation
Keywords :
audio coding; computer animation; face recognition; video coding; LPCC; MPEG-4 compliant facial animation; PLPC; acoustic speech; articulatory features; audio-to-visual conversion; facial control parameter; informative representation; linear predictive cepstral coefficients; perceptual features; perceptual linear prediction coefficients; prosodic features; speech-driven facial animation; Cybernetics; Facial animation; MPEG 4 Standard; Machine learning; Facial animation; MPEG-4; audio features; audio-to-visual conversion; talking face;
Conference_Titel :
Machine Learning and Cybernetics, 2006 International Conference on
Conference_Location :
Dalian, China
Print_ISBN :
1-4244-0061-9
DOI :
10.1109/ICMLC.2006.259085