DocumentCode
863786
Title
An Automatic Lipreading System for Spoken Digits With Limited Training Data
Author
Wang, S.L. ; Liew, A. W C ; Lau, W.H. ; Leung, S.H.
Author_Institution
Sch. of Inf. Security Eng., Shanghai Jiaotong Univ., Shanghai
Volume
18
Issue
12
fYear
2008
Firstpage
1760
Lastpage
1765
Abstract
It is well known that visual cues of lip movement contain important speech relevant information. This paper presents an automatic lipreading system for small vocabulary speech recognition tasks. Using the lip segmentation and modeling techniques we developed earlier, we obtain a visual feature vector composed of outer and inner mouth features from the lip image sequence for recognition. A spline representation is employed to transform the discrete-time sampled features from the video frames into the continuous domain. The spline coefficients in the same word class are constrained to have similar expression and are estimated from the training data by the EM algorithm. For the multiple-speaker/speaker-independent recognition task, an adaptive multimodel approach is proposed to handle the variations caused by various talking styles. After building the appropriate word models from the spline coefficients, a maximum likelihood classification approach is taken for the recognition. Lip image sequences of English digits from 0 to 9 have been collected for the recognition test. Two widely used classification methods, HMM and RDA, have been adopted for comparison and the results demonstrate that the proposed algorithm deliver the best performance among these methods.
Keywords
feature extraction; image classification; image segmentation; image sequences; speech recognition; automatic lipreading system; discrete-time sampled features; image recognition; limited training data; lip image sequence; lip modeling technique; lip movement; lip segmentation; maximum likelihood classification; mouth features; same word class; speech relevant information; spline coefficients; spline representation; spoken digits; talking styles; video frames; visual cues; visual feature vector; vocabulary speech recognition tasks; word models; Lipreading; visual feature extraction; visual speech recognition;
fLanguage
English
Journal_Title
Circuits and Systems for Video Technology, IEEE Transactions on
Publisher
ieee
ISSN
1051-8215
Type
jour
DOI
10.1109/TCSVT.2008.2004924
Filename
4625976
Link To Document