Co-training non-robust classifiers for video semantic concept detection

Author

Yan, Rong ; Naphade, Milind

Author_Institution

Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA

Volume

1

fYear

2005

fDate

11-14 Sept. 2005

Abstract

Semantic video characterization by automatic metadata tagging is increasingly popular. While some of these concepts are unimodal manifest in image or audio modalities, a large number of such concepts are multimodal manifest in both the image and the audio modalities. Further while some concepts like outdoors and face occur sufficiently in terms of frequency of occurrence in training sets, a large number are rarer to find thus making them difficult to detect during automatic annotation. Semi-supervised learning algorithms such as co-training may help by incorporating a large amount of unlabeled data, which holds the promise of allowing the redundant information across views to improve the learning performance. Unfortunately, this promise has not been realized in multimedia content analysis partly because the models built using the labeled data alone are not too robust and their noisy classification of the unlabeled data set compounds problems faced by the co-training algorithm. In this paper we analyze whether a judicious application of co-training for automatically labeling some of the unlabeled samples and reinducting them into the training set along with manual quality control can help improve the detection performance. We report our findings in the context of the TRECVID 2003 common annotation corpus.

Keywords

image classification; image sequences; learning (artificial intelligence); video signal processing; TRECVID 2003 common annotation corpus; audio modality; automatic metadata tagging; cotraining nonrobust classifiers; image modality; multimedia content analysis; quality control; semisupervised learning algorithms; video semantic concept detection; Algorithm design and analysis; Face detection; Frequency; Labeling; Performance analysis; Quality control; Robustness; Semisupervised learning; Speech; Tagging;

fLanguage

English

Publisher

ieee

Conference_Titel

Image Processing, 2005. ICIP 2005. IEEE International Conference on

Print_ISBN

0-7803-9134-9

Type

conf

DOI

10.1109/ICIP.2005.1529973

Filename

1529973