Title :
Model-Based Tabular Structure Detection and Recognition in Noisy Handwritten Documents
Author :
Jin Chen ; Lopresti, Daniel
Author_Institution :
Comput. Sci. & Eng., Lehigh Univ., Bethlehem, PA, USA
Abstract :
Tabular structure detection and recognition can be a valuable step in the analysis of unstructured documents. The noisy handwritten documents we try to analyze may contain pre-printed rulings as the substrate, hand-drawn rulings, machine-printed text, handwritten text, and signatures, in addition to the tabular structures which we wish to decompose into basic cells, rows, and columns. Although work has been done to machine-printed documents, noisy handwritten documents may require modified and/or new techniques. In this work, we try to detect and decompose tabular structures into 2-D grids of table cells simultaneously. First, we detect "key points" that help determine the physical and logical structure of tables. Then, we make use of the 2-D grid assumption to build grids of key points. Finally, we extract structural features for the Min-Cut/Max-Flow algorithm to recognize tabular structures. Experiments on 22 tables which contain 584 table cells show a cell precision of 100% and a cell recall of 93.3%.
Keywords :
document image processing; feature extraction; handwriting recognition; handwritten character recognition; minimax techniques; object detection; 2D grids; feature extraction; hand-drawn rulings; handwritten text; machine-printed text; max-flow algorithm; min-cut algorithm; model-based tabular structure detection; model-based tabular structure recognition; noisy handwritten document; preprinted rulings; signatures; table cells; Complexity theory; Computational modeling; Feature extraction; Handwriting recognition; Joining processes; Noise measurement; Substrates;
Conference_Titel :
Frontiers in Handwriting Recognition (ICFHR), 2012 International Conference on
Conference_Location :
Bari
Print_ISBN :
978-1-4673-2262-1
DOI :
10.1109/ICFHR.2012.233