DocumentCode :
569149
Title :
CO-LDA: A Semi-supervised Approach to Audio-Visual Person Recognition
Author :
Zhao, Xuran ; Evans, Nicholas ; Dugelay, Jean-Luc
Author_Institution :
Dept. of Multimedia Commun., EURECOM, Sophia-Antipolis, France
fYear :
2012
fDate :
9-13 July 2012
Firstpage :
356
Lastpage :
361
Abstract :
Client models used in Automatic Speaker Recognition (ASR) and Automatic Face Recognition (AFR) are usually trained with labelled data acquired in a small number of menthol sessions. The amount of training data is rarely sufficient to reliably represent the variation which occurs later during testing. Larger quantities of client-specific training data can always be obtained, but manual collection and labelling is often cost-prohibitive. Co-training, a paradigm of semi-supervised machine learning, which can exploit unlabelled data to enhance weakly learned client models. In this paper, we propose a co-LDA algorithm which uses both labelled and unlabelled data to capture greater intersession variation and to learn discriminative subspaces in which test examples can be more accurately classified. The proposed algorithm is naturally suited to audio-visual person recognition because vocal and visual biometric features intrinsically satisfy the assumptions of feature sufficiency and independency which guarantee the effectiveness of co-training. When tested on the MOBIO database, the proposed co-training system raises a baseline identification rate from 71% to 99% while in a verification task the Equal Error Rate (EER) is reduced from 18% to about 1%. To our knowledge, this is the first successful application of co-training in audio-visual biometric systems.
Keywords :
audio-visual systems; biometrics (access control); data acquisition; face recognition; feature extraction; image representation; learning (artificial intelligence); speaker recognition; visual databases; MOBIO database; audio-visual biometric system; audio-visual person recognition; automatic face recognition; automatic speaker recognition; baseline identification rate; client model; co-LDA algorithm; co-training system; data acquisition; feature independency; feature sufficiency; labelled data; semi-supervised machine learning; unlabelled data; variational data representation; visual biometric feature; vocal biometric feature; Adaptation models; Data models; Face; Feature extraction; Training data; Vectors; Videos; Biometrics; audio-visual person recognition; co-training; face recognition; semi-supervised learning; speaker recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia and Expo (ICME), 2012 IEEE International Conference on
Conference_Location :
Melbourne, VIC
ISSN :
1945-7871
Print_ISBN :
978-1-4673-1659-0
Type :
conf
DOI :
10.1109/ICME.2012.14
Filename :
6298423
Link To Document :
بازگشت