Speech-to-lip movement synthesis maximizing audio-visual joint probability based on EM algorithm

Author

Nakamura, S. ; Yamamoto, E. ; Shikano, K.

Author_Institution

Nara Inst. of Sci. & Technol., Ikoma, Japan

fYear

1998

fDate

7-9 Dec 1998

Firstpage

53

Lastpage

58

Abstract

We investigate methods using the hidden Markov model (HMM) to drive a lip movement sequence with input speech. We have already investigated a mapping method based on the Viterbi decoding algorithm which converts an input speech to a lip movement sequence through the most likely HMM state sequence conducted by audio HMMs. However, the method contains a substantial problem of producing errors along incorrectly decoded HMM states. This paper newly proposes a method to re-estimate the visual parameters using the HMMs of the audio-visual joint probability under the expectation-maximization (EM) algorithm. In experiments, the proposed mapping method using the EM algorithm shows an error reduction of 26% compared to a method using the Viterbi algorithm at incorrectly decoded bi-labial consonants

Keywords

audio signal processing; hidden Markov models; image motion analysis; image sequences; optimisation; parameter estimation; probability; speech coding; speech synthesis; EM algorithm; HMM state sequence; Viterbi decoding algorithm; audio HMM; audio-visual joint probability; error reduction; hidden Markov model; incorrectly decoded bi-labial consonants; input speech; lip movement sequence; mapping method; speech-to-lip movement synthesis; visual parameters re-estimation; Auditory system; Decoding; Frequency synthesizers; Hidden Markov models; Image sequences; Network synthesis; Parameter estimation; Speech processing; Speech synthesis; Viterbi algorithm;

fLanguage

English

Publisher

ieee

Conference_Titel

Multimedia Signal Processing, 1998 IEEE Second Workshop on

Conference_Location

Redondo Beach, CA

Print_ISBN

0-7803-4919-9

Type

conf

DOI

10.1109/MMSP.1998.738912

Filename

738912