مرکز منطقه ای اطلاع رساني علوم و فناوري - Audiovisual speech/speaker recognition, application to Arabic language

DocumentCode :

2135760

Title :

Audiovisual speech/speaker recognition, application to Arabic language

Author :

Chelali, Fatma Zohra ; Djeradi, Amar

Author_Institution :

Speech Commun. & Signal Process. Lab., Univ. of Sci. & Technol. Houari Boumedienne, Algiers, Algeria

fYear :

2011

fDate :

7-9 April 2011

Firstpage :

Lastpage :

Abstract :

Audio-only speaker/speech recognition systems ASR are far from being perfect especially under noisy conditions. Furthermore, it is a known fact that the content of speech can be revealed partially through lip-reading. Human speech perception is bimodal in nature: Humans combine audio and visual information in deciding what has been spoken, especially in noisy environments. In this paper, we describe a speaker identification system where lip information is fused with corresponding speech information from each speaker. The energy, the zero cross ratio (ZCR) and the pitch are used as features for the audio modality. The features for the lip texture modality are 2D-DCT coefficients of the luminance component. Intuitively, we would expect lip information to be somewhat complementary to speech information due to the range of lip movements associated with the production of the corresponding phonemes in speech using a multilayer perceptron classifier.

Keywords :

discrete cosine transforms; multilayer perceptrons; natural language processing; signal classification; speaker recognition; speech intelligibility; 2D-DCT coefficient; Arabic language; audio information; audio modality; audio-only speaker recognition; audiovisual speech recognition; human speech perception; lip movement; lip texture modality; lip-reading; luminance component; multilayer perceptron classifier; noisy environment; phoneme; pitch; speaker identification system; speech content; speech intelligibility; speech recognition system; visual information; zero cross ratio; Correlation; Discrete cosine transforms; Feature extraction; Mouth; Speech; Speech recognition; Visualization; Arabic language; Viseme classification for Arabic visual speech recognition; speaker recognition; speech recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Multimedia Computing and Systems (ICMCS), 2011 International Conference on

Conference_Location :

Ouarzazate

ISSN :

Pending

Print_ISBN :

978-1-61284-730-6

Type :

conf

DOI :

10.1109/ICMCS.2011.5945713

Filename :

5945713

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2135760