Title :
Mutual information eigenlips for audio-visual speech recognition
Author :
Arsic, Ivana ; Thiran, Jean-Philippe
Author_Institution :
Signal Process. Inst., Ecole Polytech. Fed. de Lausanne (EPFL), Lausanne, Switzerland
Abstract :
This paper proposes an application of information theoretic approach for finding the most informative subset of eigen-features to be used for audio-visual speech recognition tasks. The state-of-the-art visual feature extraction methods in the area of speechreading rely on either pixel or geometric based methods or their combination. However, there is no common rule defining how these features have to be selected with respect to the chosen set of audio cues and how well they represent the classes of the uttered speech. Our main objective is to exploit the complementarity of audio and visual sources and select meaningful visual descriptors by the means of mutual information. We focus on the principal components projections of the mouth region images and apply the proposed method such that only those cues having the highest mutual information with word classes are retained. The algorithm is tested by performing various speech recognition experiments on a chosen audio-visual dataset. The obtained recognition rates are compared to those acquired using a conventional principal component analysis and promising results are shown.
Keywords :
audio-visual systems; eigenvalues and eigenfunctions; feature extraction; principal component analysis; speech recognition; audio-visual dataset; audio-visual speech recognition; geometric based method; information theoretic approach; mouth region image; mutual information eigenlips; principal component analysis; principal component projection; speechreading; visual descriptor; visual feature extraction method; Accuracy; Mutual information; Principal component analysis; Speech; Speech recognition; Vectors; Visualization;
Conference_Titel :
Signal Processing Conference, 2006 14th European
Conference_Location :
Florence