مرکز منطقه ای اطلاع رساني علوم و فناوري - Audio-Visual Speech Processing Framework for Lip Reading

DocumentCode :

1576091

Title :

Audio-Visual Speech Processing Framework for Lip Reading

Author :

Nasr, Abdulbaset M. ; Ramli, Abd Rahman ; Hamiruce, Mohammad ; Subramaniam, Shamala K.

Author_Institution :

Robot. Lab., Univ. Putra Malaysia, Ehsan

fYear :

2008

Firstpage :

Lastpage :

Abstract :

It is well known that speech production and perception process is inherently bimodal consisting of audio and visual components. Recently there has been increased interest in using the visual modality in combination with the acoustic modality for improved speech processing. This field of study has gained the title of audio-visual speech processing. Lip movement recognition, also known as lip reading, is a communication skill which involves the interpretation of lip movements in order to estimate some important parameters of the lips that include, but not limited to, size, shape and orientation. In this paper, we represent a hybrid framework for lip reading which is based on both audio and visual speech parameters extracted from a video stream of isolated spoken words. The proposed algorithm is self-tuned in the sense that it starts with an estimations of speech parameters based on visual lip features and then the coefficients of the algorithm are fine-tuned based on the extracted audio parameters. In the audio speech processing part, extracted audio features are used to generate a vector containing information of the speech phonemes. These information are used later to enhance the recognition and matching process. For lip feature extraction, we use a modified version of the method used by F. Huang and T. Chen for tracking of multiple faces. This method is based on statistical color modeling and the deformable template. The experiments based on the proposed framework showed interesting results in recognition of isolated words.

Keywords :

audio signal processing; estimation theory; feature extraction; image colour analysis; speech processing; video signal processing; audio-visual speech processing; communication skill; deformable template; feature extraction; lip movement recognition; lip reading; multiple face tracking; parameter estimation; statistical color modeling; video stream; Communications technology; Computer networks; Computer science; Data mining; Feature extraction; Image processing; Parameter estimation; Robots; Speech processing; Systems engineering and theory; Image processing; Lip reading; Speech processing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information and Communication Technologies: From Theory to Applications, 2008. ICTTA 2008. 3rd International Conference on

Conference_Location :

Damascus

Print_ISBN :

978-1-4244-1751-3

Electronic_ISBN :

978-1-4244-1752-0

Type :

conf

DOI :

10.1109/ICTTA.2008.4530033

Filename :

4530033

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1576091