Title :
Geometrical and Pixel Based Lip Feature Fusion in Speech Synthesis System Driven by Visual-speech
Author_Institution :
Sch. of Inf. Eng., HeBei Univ. of Technol., Tianjin, China
Abstract :
Lipreading is applied to synthesize speech for the speech-impaired people. To get a higher recognition result, data fusion with weighting coefficients at feature level is used to integrate the lip information from different kinds of lip features. Experiments are carried out based on HMM with different states and Gaussian mixture component in a small database for speaker-dependent case. From the recognition results, the most important conclusion that can be drawn is that, the integrated discriminate vector after feature fusion outperforms than geometrical features vector only, DCT descriptors vector only and DCT coefficients vector only with 4 states and 16 Gaussian mixture component HMM. And compare with the geometrical features vector and DCT descriptors cascaded method, the geometrical features vector and DCT coefficients cascaded method integrates more information of lip region, and the recognition rate is improved by as much as 3.18% with best weighting coefficients (m: n=1.5:1).
Keywords :
Gaussian processes; discrete cosine transforms; feature extraction; hidden Markov models; image fusion; image sequences; speech synthesis; Gaussian mixture; data fusion; discrete cosine transforms; discriminate vector; geometrical based lip feature fusion; hidden Markov models; lipreading; pixel based lip feature fusion; speech synthesis system; speech-impaired people; visual-speech; weighting coefficients; Acoustics; Adaptation model; Computational modeling; Discrete cosine transforms; Educational institutions; Hidden Markov models;
Conference_Titel :
Computational Intelligence and Natural Computing Proceedings (CINC), 2010 Second International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-7705-0
DOI :
10.1109/CINC.2010.5643872