Title :
Realistic mouth animation based on an articulatory DBN model with constrained asynchrony
Author :
Jiang, Dongmei ; Ravyse, Ilse ; Liu, Peizhen ; Sahli, Hichem ; Verhelst, Werner
Author_Institution :
VUB-NPU Joint Res. Group on Audio Visual Signal Process. (AVSP), Northwestern Polytech. Univ., Xi´´an, China
Abstract :
In this paper, we propose an approach to convert acoustic speech to video realistic mouth animation based on an articulatory dynamic Bayesian network model with constrained asynchrony (AF_AVDBN). Conditional probability distributions are defined to control the asynchronies between the articulators such as lips, tongue and glottis/velum. An EM-based conversion algorithm is also presented to learn the optimal visual features given an auditory input and the trained AF_AVDBN parameters. In the training of the AF_AVDBN models, downsampled YUV spatial frequency features of the interpolated mouth image sequences are extracted as visual features. For reproducing the mouth animation sequence, from the learned visual features, a spatial upsampling and a temporal downsampling are applied. Both qualitative and quantitative results show that the proposed method is capable of producing more natural and realistic mouth animations, and the accuracy is further improved compared to the state of the art multi-stream Hidden Markov Model (MSHMM) and articulatory DBN model without asynchrony constraint (AF_DBN).
Keywords :
belief networks; computer animation; constraint handling; feature extraction; hearing; hidden Markov models; image sequences; interpolation; speech processing; statistical distributions; EM-based conversion algorithm; acoustic speech; articulatory DBN model; articulatory dynamic Bayesian network model; auditory input; constrained asynchrony; downsampled YUV spatial frequency; feature extraction; interpolated mouth image sequences; multistream Hidden Markov Model; optimal visual features; probability distributions; spatial upsampling; temporal downsampling; video realistic mouth animation; Animation; Bayesian methods; Frequency; Hidden Markov models; Image converters; Lips; Mouth; Probability distribution; Speech; Tongue; AF_AVDBN; AF_DBN; asynchrony; conditional probability distribution; mouth animation;
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2010.5494894