DocumentCode :
3670761
Title :
Comparison of depth-based features for lipreading
Author :
Karel Paleček
Author_Institution :
The Institute of Information Technology and Electronics at the Technical University of Liberec
fYear :
2015
fDate :
7/1/2015 12:00:00 AM
Firstpage :
1
Lastpage :
4
Abstract :
We examine the effect of depth information captured by Microsoft Kinect on the task of visual speech recognition. We propose depth-based active appearance model (AAM) features and show improved results over discrete cosine transform (DCT). The visual and depth features are evaluated on a database of 54 speakers each uttering 50 isolated words. In order to exploit the speech dynamics, the features are enhanced by a simplified one-stage variant of hierarchical linear discriminant analysis (Hi-LDA). In the experiments, we consider feature fusion via combined video-depth active appearance model as a form of early integration, and compare it to traditional multi-stream hidden Markov Model as a form of decision fusion. We also perform experiments on audio-visual recognition in noisy environments and show improved results of incorporating depth information over both traditional audio-video fusion and utilization of speech enhancement algorithms.
Keywords :
"Visualization","Active appearance model","Shape","Speech","Discrete cosine transforms","Hidden Markov models","Feature extraction"
Publisher :
ieee
Conference_Titel :
Telecommunications and Signal Processing (TSP), 2015 38th International Conference on
Type :
conf
DOI :
10.1109/TSP.2015.7296400
Filename :
7296400
Link To Document :
بازگشت