DocumentCode :
1576091
Title :
Audio-Visual Speech Processing Framework for Lip Reading
Author :
Nasr, Abdulbaset M. ; Ramli, Abd Rahman ; Hamiruce, Mohammad ; Subramaniam, Shamala K.
Author_Institution :
Robot. Lab., Univ. Putra Malaysia, Ehsan
fYear :
2008
Firstpage :
1
Lastpage :
2
Abstract :
It is well known that speech production and perception process is inherently bimodal consisting of audio and visual components. Recently there has been increased interest in using the visual modality in combination with the acoustic modality for improved speech processing. This field of study has gained the title of audio-visual speech processing. Lip movement recognition, also known as lip reading, is a communication skill which involves the interpretation of lip movements in order to estimate some important parameters of the lips that include, but not limited to, size, shape and orientation. In this paper, we represent a hybrid framework for lip reading which is based on both audio and visual speech parameters extracted from a video stream of isolated spoken words. The proposed algorithm is self-tuned in the sense that it starts with an estimations of speech parameters based on visual lip features and then the coefficients of the algorithm are fine-tuned based on the extracted audio parameters. In the audio speech processing part, extracted audio features are used to generate a vector containing information of the speech phonemes. These information are used later to enhance the recognition and matching process. For lip feature extraction, we use a modified version of the method used by F. Huang and T. Chen for tracking of multiple faces. This method is based on statistical color modeling and the deformable template. The experiments based on the proposed framework showed interesting results in recognition of isolated words.
Keywords :
audio signal processing; estimation theory; feature extraction; image colour analysis; speech processing; video signal processing; audio-visual speech processing; communication skill; deformable template; feature extraction; lip movement recognition; lip reading; multiple face tracking; parameter estimation; statistical color modeling; video stream; Communications technology; Computer networks; Computer science; Data mining; Feature extraction; Image processing; Parameter estimation; Robots; Speech processing; Systems engineering and theory; Image processing; Lip reading; Speech processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Communication Technologies: From Theory to Applications, 2008. ICTTA 2008. 3rd International Conference on
Conference_Location :
Damascus
Print_ISBN :
978-1-4244-1751-3
Electronic_ISBN :
978-1-4244-1752-0
Type :
conf
DOI :
10.1109/ICTTA.2008.4530033
Filename :
4530033
Link To Document :
بازگشت