Title :
CO-LDA: A Semi-supervised Approach to Audio-Visual Person Recognition
Author :
Zhao, Xuran ; Evans, Nicholas ; Dugelay, Jean-Luc
Author_Institution :
Dept. of Multimedia Commun., EURECOM, Sophia-Antipolis, France
Abstract :
Client models used in Automatic Speaker Recognition (ASR) and Automatic Face Recognition (AFR) are usually trained with labelled data acquired in a small number of menthol sessions. The amount of training data is rarely sufficient to reliably represent the variation which occurs later during testing. Larger quantities of client-specific training data can always be obtained, but manual collection and labelling is often cost-prohibitive. Co-training, a paradigm of semi-supervised machine learning, which can exploit unlabelled data to enhance weakly learned client models. In this paper, we propose a co-LDA algorithm which uses both labelled and unlabelled data to capture greater intersession variation and to learn discriminative subspaces in which test examples can be more accurately classified. The proposed algorithm is naturally suited to audio-visual person recognition because vocal and visual biometric features intrinsically satisfy the assumptions of feature sufficiency and independency which guarantee the effectiveness of co-training. When tested on the MOBIO database, the proposed co-training system raises a baseline identification rate from 71% to 99% while in a verification task the Equal Error Rate (EER) is reduced from 18% to about 1%. To our knowledge, this is the first successful application of co-training in audio-visual biometric systems.
Keywords :
audio-visual systems; biometrics (access control); data acquisition; face recognition; feature extraction; image representation; learning (artificial intelligence); speaker recognition; visual databases; MOBIO database; audio-visual biometric system; audio-visual person recognition; automatic face recognition; automatic speaker recognition; baseline identification rate; client model; co-LDA algorithm; co-training system; data acquisition; feature independency; feature sufficiency; labelled data; semi-supervised machine learning; unlabelled data; variational data representation; visual biometric feature; vocal biometric feature; Adaptation models; Data models; Face; Feature extraction; Training data; Vectors; Videos; Biometrics; audio-visual person recognition; co-training; face recognition; semi-supervised learning; speaker recognition;
Conference_Titel :
Multimedia and Expo (ICME), 2012 IEEE International Conference on
Conference_Location :
Melbourne, VIC
Print_ISBN :
978-1-4673-1659-0
DOI :
10.1109/ICME.2012.14