Title :
Retrieval of Handwritten Lines in Historical Documents
Author :
Schomaker, L.R.B.
Abstract :
This study describes methods for the retrieval of handwritten lines of text in a historical administrative collection. The goal is to develop generic methods for bootstrapping the retrieval system from a tabula rasa starting condition, i.e., the virtual absence of labeled samples. By exploiting the currently available computing power and the fact that computation takes place off line, it should be possible to provide a good starting point for statistical learning methods. In this manner, a closed collection can be incrementally indexed. A cross-correlation method on line-strip images is presented and results are compared to feature-based methods.
Keywords :
administrative data processing; document image processing; feature extraction; handwriting recognition; information retrieval; learning (artificial intelligence); text analysis; bootstrapping; cross-correlation method; feature-based methods; handwritten line retrieval; handwritten text; historical administrative collection; historical documents; line-strip images; statistical learning; tabula rasa starting condition; Handwriting recognition; Humans; Image quality; Image retrieval; Image segmentation; Information retrieval; Labeling; Optical character recognition software; Strips; Text recognition;
Conference_Titel :
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location :
Parana
Print_ISBN :
978-0-7695-2822-9
DOI :
10.1109/ICDAR.2007.4376984