DocumentCode :
2135760
Title :
Audiovisual speech/speaker recognition, application to Arabic language
Author :
Chelali, Fatma Zohra ; Djeradi, Amar
Author_Institution :
Speech Commun. & Signal Process. Lab., Univ. of Sci. & Technol. Houari Boumedienne, Algiers, Algeria
fYear :
2011
fDate :
7-9 April 2011
Firstpage :
1
Lastpage :
7
Abstract :
Audio-only speaker/speech recognition systems ASR are far from being perfect especially under noisy conditions. Furthermore, it is a known fact that the content of speech can be revealed partially through lip-reading. Human speech perception is bimodal in nature: Humans combine audio and visual information in deciding what has been spoken, especially in noisy environments. In this paper, we describe a speaker identification system where lip information is fused with corresponding speech information from each speaker. The energy, the zero cross ratio (ZCR) and the pitch are used as features for the audio modality. The features for the lip texture modality are 2D-DCT coefficients of the luminance component. Intuitively, we would expect lip information to be somewhat complementary to speech information due to the range of lip movements associated with the production of the corresponding phonemes in speech using a multilayer perceptron classifier.
Keywords :
discrete cosine transforms; multilayer perceptrons; natural language processing; signal classification; speaker recognition; speech intelligibility; 2D-DCT coefficient; Arabic language; audio information; audio modality; audio-only speaker recognition; audiovisual speech recognition; human speech perception; lip movement; lip texture modality; lip-reading; luminance component; multilayer perceptron classifier; noisy environment; phoneme; pitch; speaker identification system; speech content; speech intelligibility; speech recognition system; visual information; zero cross ratio; Correlation; Discrete cosine transforms; Feature extraction; Mouth; Speech; Speech recognition; Visualization; Arabic language; Viseme classification for Arabic visual speech recognition; speaker recognition; speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia Computing and Systems (ICMCS), 2011 International Conference on
Conference_Location :
Ouarzazate
ISSN :
Pending
Print_ISBN :
978-1-61284-730-6
Type :
conf
DOI :
10.1109/ICMCS.2011.5945713
Filename :
5945713
Link To Document :
بازگشت