DocumentCode
332272
Title
Speech-to-lip movement synthesis maximizing audio-visual joint probability based on EM algorithm
Author
Nakamura, S. ; Yamamoto, E. ; Shikano, K.
Author_Institution
Nara Inst. of Sci. & Technol., Ikoma, Japan
fYear
1998
fDate
7-9 Dec 1998
Firstpage
53
Lastpage
58
Abstract
We investigate methods using the hidden Markov model (HMM) to drive a lip movement sequence with input speech. We have already investigated a mapping method based on the Viterbi decoding algorithm which converts an input speech to a lip movement sequence through the most likely HMM state sequence conducted by audio HMMs. However, the method contains a substantial problem of producing errors along incorrectly decoded HMM states. This paper newly proposes a method to re-estimate the visual parameters using the HMMs of the audio-visual joint probability under the expectation-maximization (EM) algorithm. In experiments, the proposed mapping method using the EM algorithm shows an error reduction of 26% compared to a method using the Viterbi algorithm at incorrectly decoded bi-labial consonants
Keywords
audio signal processing; hidden Markov models; image motion analysis; image sequences; optimisation; parameter estimation; probability; speech coding; speech synthesis; EM algorithm; HMM state sequence; Viterbi decoding algorithm; audio HMM; audio-visual joint probability; error reduction; hidden Markov model; incorrectly decoded bi-labial consonants; input speech; lip movement sequence; mapping method; speech-to-lip movement synthesis; visual parameters re-estimation; Auditory system; Decoding; Frequency synthesizers; Hidden Markov models; Image sequences; Network synthesis; Parameter estimation; Speech processing; Speech synthesis; Viterbi algorithm;
fLanguage
English
Publisher
ieee
Conference_Titel
Multimedia Signal Processing, 1998 IEEE Second Workshop on
Conference_Location
Redondo Beach, CA
Print_ISBN
0-7803-4919-9
Type
conf
DOI
10.1109/MMSP.1998.738912
Filename
738912
Link To Document