Title :
Handwritten Information Extraction from Historical Census Documents
Author :
Nion, Thibauld ; Menasri, Fares ; Louradour, Jerome ; Sibade, Cedric ; Retornaz, Thomas ; Metaireau, Pierre-Yves ; Kermorvant, Christopher
Author_Institution :
A2iA, Paris, France
Abstract :
This paper describes a complete system for hand-written information extraction in historical documents. The system was evaluated in real conditions and at a large scale (8 millions of snippets) on the tables of the 1930 US Census. The location of the table position was based on a registration algorithm using printed word anchors. The rows and columns were extracted for nine different fields. For each field, a recognizer based either on convolutional neural networks for small lexicon fields or recurrent neural networks for large lexicon fields were trained. This system yields very high results for data extraction, allowing to achieve more than 70% of automation rate at a error rate similar to human keyers for a complete identity field.
Keywords :
document image processing; feature extraction; handwriting recognition; image registration; recurrent neural nets; US Census; convolutional neural networks; data extraction; handwritten information extraction; historical census documents; large lexicon fields; printed word anchors; recurrent neural networks; registration algorithm; small lexicon fields; Automation; Databases; Dictionaries; Error analysis; Handwriting recognition; Neural networks; Text analysis; Document layout analysis; Handwriting recognition; Historical document processing; convolutional and recurrent neural networks;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.168