DocumentCode :
3707328
Title :
Regression based landmark estimation and multi-feature fusion for visual speech recognition
Author :
Hong Liu;Xuewu Zhang;Pingping Wu
Author_Institution :
Key Laboratory of Machine Perception (Ministry of Education), Engineering Lab on Intelligent Perception for Internet of Things (ELIP), Shenzhen Graduate School, Peking University, China
fYear :
2015
Firstpage :
808
Lastpage :
812
Abstract :
Visual speech recognition also known as lipreading can improve robustness of automatic acoustic speech recognition especially under noisy environments. However, it remains a challenging topic considering the variety of speaking characteristics and confusion between visual speech features. In this paper, we propose an automatic lipreading method by using a new lip tracking method and multiple visual information fusion to tackle the problem. First, a method of face landmark estimation based on regression is employed for lip detection, based on which a geometric-based shape invariant feature (SIF) is put forward. Moreover, it can also be applied to the removal of the non-speaking utterance. Then the motion interchange patterns and spatial-temporal descriptors are also adopted to describe the lip information, where the Bayes combination strategy is applied. The proposed method is explored on three benchmark data sets: Avletters2, OuluVS and PKUVS. Experimental results demonstrate promising results and show effectiveness of the proposed approach.
Keywords :
"Shape","Mouth","Visualization","Face","Feature extraction","Speech recognition","Speech"
Publisher :
ieee
Conference_Titel :
Image Processing (ICIP), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/ICIP.2015.7350911
Filename :
7350911
Link To Document :
بازگشت