DocumentCode :
2012332
Title :
Learning Domain-Specific Feature Descriptors for Document Images
Author :
Ramakrishnan, Kandan ; Bart, Evgeniy
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Minnesota, Minneapolis, MN, USA
fYear :
2012
fDate :
27-29 March 2012
Firstpage :
415
Lastpage :
418
Abstract :
Many machine learning algorithms rely on feature descriptors to access information about image appearance. Using an appropriate descriptor is therefore crucial for the algorithm to succeed. Although domain- and task-specific feature descriptors may result in excellent performance, they currently have to be hand-crafted, a difficult and time-consuming process. In contrast, general-purpose descriptors (such as SIFT) are easy to apply and have proved successful for a variety of tasks, including classification, segmentation, and clustering. Unfortunately, most general-purpose feature descriptors are targeted at natural images and may perform poorly in document analysis tasks. In this paper, we propose a method for automatically learning feature descriptors tuned to a given image domain. The method works by first extracting the independent components of the images, and then building a descriptor by pooling these components over multiple overlapping regions. We test the proposed method on several document analysis tasks and several datasets, and show that it outperforms existing general-purpose feature descriptors.
Keywords :
data analysis; document image processing; feature extraction; image classification; image segmentation; learning (artificial intelligence); pattern clustering; SIFT descriptor; classification task; clustering task; document analysis task; document image; domain-specific feature descriptor; general-purpose descriptor; image appearance; machine learning algorithm; scale invariant feature transform; segmentation task; task-specific feature descriptor; Detectors; Dictionaries; Feature extraction; Image edge detection; Optical character recognition software; Text analysis; Visualization; Feature descriptors; classification; feature learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on
Conference_Location :
Gold Cost, QLD
Print_ISBN :
978-1-4673-0868-7
Type :
conf
DOI :
10.1109/DAS.2012.49
Filename :
6195405
Link To Document :
بازگشت