Title :
Greedy Search for Active Learning of OCR
Author :
Agarwal, Abhishek ; Garg, Radhika ; Chaudhury, Santanu
Author_Institution :
Dept. of Electr. Eng., Indian Inst. of Technol., Delhi, New Delhi, India
Abstract :
Active learning and crowd sourcing are becoming increasingly popular in the machine learning community for fast and cost effective generation of labels for large volumes of data. However, such labels may be noisy. So, it becomes important to ignore the noisy labels for building of a good classifier. We propose a framework for finding the best possible augmentation of a classifier for the character recognition problem using minimum number of crowd labeled samples. The approach inherently rejects the noisy data and tries to accept a subset of correctly labeled data to maximize the classifier performance.
Keywords :
image classification; learning (artificial intelligence); optical character recognition; search problems; OCR; active learning; character recognition problem; classifier; crowd labeled samples; greedy search; noisy data rejection; optical character recognition; Accuracy; Character recognition; Noise; Noise measurement; Optical character recognition software; Support vector machines; Training; Character recognition; Indian scripts; active learning; crowd sourcing; greedy search; incremental SVM;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.171