Title :
Local Ordinal Contrast Pattern Histograms for Spatiotemporal, Lip-Based Speaker Authentication
Author :
Chan, Chi Ho ; Goswami, Budhaditya ; Kittler, Josef ; Christmas, William
Author_Institution :
Centre for Vision, Speech & Signal Process., Univ. of Surrey, Guildford, UK
fDate :
4/1/2012 12:00:00 AM
Abstract :
Lip region deformation during speech contains biometric information and is termed visual speech. This biometric information can be interpreted as being genetic or behavioral depending on whether static or dynamic features are extracted. In this paper, we use a texture descriptor called local ordinal contrast pattern (LOCP) with a dynamic texture representation called three orthogonal planes to represent both the appearance and dynamics features observed in visual speech. This feature representation, when used in standard speaker verification engines, is shown to improve the performance of the lip-biometric trait compared to the state-of-the-art. The best baseline state-of-the-art performance was a half total error rate (HTER) of 13.35% for the XM2VTS database. We obtained HTER of less than 1%. The resilience of the LOCP texture descriptor to random image noise is also investigated. Finally, the effect of the amount of video information on speaker verification performance suggests that with the proposed approach, speaker identity can be verified with a much shorter biometric trait record than the length normally required for voice-based biometrics. In summary, the performance obtained is remarkable and suggests that there is enough discriminative information in the mouth-region to enable its use as a primary biometric trait.
Keywords :
biometrics (access control); error statistics; feature extraction; image recognition; image representation; image texture; spatiotemporal phenomena; speaker recognition; LOCP texture descriptor; XM2VTS database; baseline state-of-the-art performance; biometric information; dynamic feature extraction; dynamic texture representation; feature representation; half total error rate; lip region deformation; lip-biometric trait; local ordinal contrast pattern histogram; orthogonal planes; random image noise; spatiotemporal lip-based speaker authentication; speaker identity; standard speaker verification engine; static feature extraction; video information; visual speech; voice-based biometrics; Databases; Encoding; Face; Feature extraction; Histograms; Speech; Visualization; Biometrics; dynamic texture; lip; ordinal contrast; spatiotemporal; speaker verification; texture descriptor;
Journal_Title :
Information Forensics and Security, IEEE Transactions on
DOI :
10.1109/TIFS.2011.2175920