مرکز منطقه ای اطلاع رساني علوم و فناوري - Regression based landmark estimation and multi-feature fusion for visual speech recognition

DocumentCode :

3707328

Title :

Regression based landmark estimation and multi-feature fusion for visual speech recognition

Author :

Hong Liu;Xuewu Zhang;Pingping Wu

Author_Institution :

Key Laboratory of Machine Perception (Ministry of Education), Engineering Lab on Intelligent Perception for Internet of Things (ELIP), Shenzhen Graduate School, Peking University, China

fYear :

2015

Firstpage :

808

Lastpage :

812

Abstract :

Visual speech recognition also known as lipreading can improve robustness of automatic acoustic speech recognition especially under noisy environments. However, it remains a challenging topic considering the variety of speaking characteristics and confusion between visual speech features. In this paper, we propose an automatic lipreading method by using a new lip tracking method and multiple visual information fusion to tackle the problem. First, a method of face landmark estimation based on regression is employed for lip detection, based on which a geometric-based shape invariant feature (SIF) is put forward. Moreover, it can also be applied to the removal of the non-speaking utterance. Then the motion interchange patterns and spatial-temporal descriptors are also adopted to describe the lip information, where the Bayes combination strategy is applied. The proposed method is explored on three benchmark data sets: Avletters2, OuluVS and PKUVS. Experimental results demonstrate promising results and show effectiveness of the proposed approach.

Keywords :

"Shape","Mouth","Visualization","Face","Feature extraction","Speech recognition","Speech"

Publisher :

ieee

Conference_Titel :

Image Processing (ICIP), 2015 IEEE International Conference on

Type :

conf

DOI :

10.1109/ICIP.2015.7350911

Filename :

7350911

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3707328