Title :
A Rotation Invariant Page Layout Descriptor for Document Classification and Retrieval
Author :
Gordo, Albert ; Valveny, Ernest
Author_Institution :
Comput. Vision Center, Univ. Autonoma de Barcelona, Barcelona, Spain
Abstract :
Document classification usually requires of structural features such as the physical layout to obtain good accuracy rates on complex documents. This paper introduces a descriptor of the layout and a distance measure based on the cyclic dynamic time warping which can be computed in O(n2). This descriptor is translation invariant and can be easily modified to be scale and rotation invariant. Experiments with this descriptor and its rotation invariant modification are performed on the Girona archives database and compared against another common layout distance, the minimum weight edge cover. The experiments show that these methods outperform the MWEC both in accuracy and speed, particularly on rotated documents.
Keywords :
classification; computational complexity; document handling; information retrieval; Girona archive database; computational complexity; cyclic dynamic time warping; document classification; document retrieval; minimum weight edge cover; rotation invariant page layout descriptor; Computer vision; Databases; Earth; Feature extraction; Image segmentation; Optical character recognition software; Pixel; Text analysis; Time measurement; Tree graphs; Document classification; cyclic dynamic time warping; retrieval; rotation invariant;
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2009.110