DocumentCode
2240922
Title
Integrating co-training and recognition for text detection
Author
Wu, Wen ; Chen, Datong ; Yang, Jie
Author_Institution
Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
fYear
2005
fDate
6-8 July 2005
Abstract
Training a good text detector requires a large amount of labeled data, which can be very expensive to obtain. Co-training has been shown to be a powerful semi-supervised learning tool for solving many problems using a large amount of unlabeled data. However, augmented data from a co-training process could potentially degrade the performance of classifiers due to added noises from unlabeled data. This paper makes two contributions by proposing a modified co-training scheme for text detection. First, to get cleaner augmented data, the new algorithm integrates some authority knowledge of unlabeled data into co-training. Text recognition output of each selected unlabeled image patch is used as the authority that is combined with classifier prediction to decide if the sample will be added to the augmented set. Second, instead of evenly combining predictions of two co-training classifiers, a weighted combination is learned and used to produce the final prediction. Contributions of the new algorithm have been evaluated on a standard text detection dataset.
Keywords
character recognition; image classification; learning (artificial intelligence); text analysis; augmented data; authority knowledge; classifier prediction; modified cotraining scheme; semisupervised learning tool; standard text detection dataset; text recognition; unlabeled image patch; weighted combination; Computer science; Degradation; Detectors; Image edge detection; Semisupervised learning; Supervised learning; Testing; Text recognition; Training data; Videos;
fLanguage
English
Publisher
ieee
Conference_Titel
Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on
Print_ISBN
0-7803-9331-7
Type
conf
DOI
10.1109/ICME.2005.1521634
Filename
1521634
Link To Document