DocumentCode :
1790924
Title :
Phoneme confusability reduction by using visual information in noisy environment
Author :
Varshney, Praveen ; Bansal, Ankur ; Farooq, Omar
Author_Institution :
Dept. of Electron. & Commun. Eng., GLA Univ., Mathura, India
fYear :
2014
fDate :
12-13 July 2014
Firstpage :
476
Lastpage :
481
Abstract :
Robust speech recognition has been a prominent research area in the recent past. The important aspect of speech recognition system is phoneme identification. It is a well established fact that the performance of speech recognition system varies under different background conditions. Using visual information in speech recognition makes the system robust to the problems associated with acoustic noise. In this paper, an automated Audio Visual Phoneme Recognition (AVPR) system has been proposed and implemented for Hindi language. A set of fifty sentences is used to extract the samples of utterances of phoneme and corresponding viseme shape. Mel Frequency Cepstral coefficient (MFCC) based technique is used to form the feature set for audio signal. Discrete Wavelet Transform (DWT) and Discrete Cosine Transform (DCT) are used to extract the visual information. Early integration technique is used to integrate the audio and visual feature set. Discrimination analysis based classifier is applied for the recognition of phonemes. To show the effect of interclass confusion associate in the viseme classes, the experiments are performed for 4 viseme classes and 8 viseme classes separately in clean and noisy background conditions. Visual information is utilized to decrease the effect of interclass confusion on phonemes. The overall maximum accuracy is 49.44% and 38.81% for 4 and 8 viseme classes respectively by using linear discrimination. It has been also established that an improvement of 2.91% and 6.07% is obtained by integrating visual information along with audio signal at -10 dB Signal to Noise Ratio (SNR).
Keywords :
cepstral analysis; discrete cosine transforms; discrete wavelet transforms; speech recognition; AVPR system; DCT; DWT; Hindi language; MFCC based technique; Mel frequency cepstral coefficient based technique; SNR; audio feature set; audio signal; automated audio visual phoneme recognition system; background conditions; clean background conditions; discrete cosine transform; discrete wavelet transform; discrimination analysis based classifier; integration technique; interclass confusion; noisy background conditions; phoneme confusability reduction; signal to noise ratio; speech recognition system; viseme classes; viseme shape; visual feature set; visual information; Feature extraction; Random access memory; Speech; Speech recognition; Vectors; Visualization; Vocabulary; DCT; DWT; Discrimination Analysis; Feature Extraction; MFCC; Speech Recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Propagation and Computer Technology (ICSPCT), 2014 International Conference on
Conference_Location :
Ajmer
Print_ISBN :
978-1-4799-3139-2
Type :
conf
DOI :
10.1109/ICSPCT.2014.6884883
Filename :
6884883
Link To Document :
بازگشت