مرکز منطقه ای اطلاع رساني علوم و فناوري - Automatic visual feature extraction for Mandarin audio-visual speech recognition

DocumentCode :

2566160

Title :

Automatic visual feature extraction for Mandarin audio-visual speech recognition

Author :

Pao, Tsang-Long ; Liao, Wen-Yuan ; Wu, Tsan-Nung ; Lin, Ching-Yi

Author_Institution :

Dept. of Comput. Sci. & Eng., Tatung Univ., Taipei, Taiwan

fYear :

2009

fDate :

11-14 Oct. 2009

Firstpage :

2936

Lastpage :

2940

Abstract :

Automatic speech recognition (ASR) by machine has been an attractive research area in past several decades. In recent years, there are many automatic speech-reading systems proposed that utilizing the combination of audio and visual speech features. In this paper, we proposed an automatic visual feature extraction approach to extract the visual features of the lips that can be used in the audio-visual speech recognition system. These features are important to the recognition system, especially in noisy condition. The segmentation of the lip region uses both color and edge information. We then establish a set of visual speech parameters and incorporate them into the recognizer. The WD-KNN classifier is used as the recognition engine in this paper. We present recognition performance using various visual features to explore their impact on the recognition accuracy. These features include the geometric and the motion of the lip. The experimental results based on Mandarin databases demonstrate that the visual information is highly effective for improving the recognition performance.

Keywords :

edge detection; feature extraction; image classification; image colour analysis; image motion analysis; image segmentation; speech processing; speech recognition; ASR; Mandarin audio-visual speech recognition; WD-KNN classifier; automatic visual feature extraction; color information; edge information; lip motion; lip region segmentation; noisy condition; speech-reading system; Automatic speech recognition; Computer science; Cybernetics; Data mining; Feature extraction; Lips; Speech analysis; Speech processing; Speech recognition; USA Councils; WD-KNN classifier; audio speech feature; audio-visual speech recognition; visual speech feature;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on

Conference_Location :

San Antonio, TX

ISSN :

1062-922X

Print_ISBN :

978-1-4244-2793-2

Electronic_ISBN :

1062-922X

Type :

conf

DOI :

10.1109/ICSMC.2009.5346011

Filename :

5346011

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2566160