Title :
Rendering a personalized photo-real talking head from short video footage
Author :
Wang, Lijuan ; Han, Wei ; Qian, Xiaojun ; Soong, Frank K.
Author_Institution :
Microsoft Res. Asia, Beijing, China
fDate :
Nov. 29 2010-Dec. 3 2010
Abstract :
In this paper, we propose an HMM trajectory-guided, real image sample concatenation approach to photo-real talking head synthesis. An audio-visual database of a person is recorded first for training a statistical Hidden Markov Model (HMM) of Lips movement. The HMM is then used to generate the dynamic trajectory of lips movement for given speech signals in the maximum probability sense. The generated trajectory is then used as a guide to select, from the original training database, an optimal sequence of lips images which are then stitched back to a background head video. The whole procedure is fully automatic and data driven. For as short as 20 minutes recording of audio/video footage, the proposed system can synthesize a highly photo-real talking head in sync with the given speech signals (natural or TTS synthesized). This system won the first place in the A/V consistency contest in LIPS Challenge(2009), perceptually evaluated by recruited human subjects.
Keywords :
hidden Markov models; image sampling; image sequences; realistic images; rendering (computer graphics); speech synthesis; video signal processing; visual databases; audio visual database; image sample concatenation approach; image sequence; lip movement; maximum probability sense; photo real talking head synthesis; real time rendering; speech signals; video footage; Hidden Markov models; Lips; Magnetic heads; Training; Trajectory; Visualization; photo-real; talking head; trajectory-guided; visual speech synthesis;
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
Conference_Location :
Tainan
Print_ISBN :
978-1-4244-6244-5
DOI :
10.1109/ISCSLP.2010.5684834