Title :
Text Line Detection for Heterogeneous Documents
Author :
Diem, Markus ; Kleber, Florian ; Sablatnig, Robert
Author_Institution :
Comput. Vision Lab., Vienna Univ. of Technol., Vienna, Austria
Abstract :
Text line detection is a pre-processing step for automated document analysis such as word spotting or OCR. It is additionally used for document structure analysis or layout analysis. Considering mixed layouts, degraded documents and handwritten documents, text line detection is still challenging. We present a novel approach that targets torn documents having varying layouts and writing. The proposed method is a bottom up approach that fuses words, to globally minimize their fusing distance. In order to improve processing time and further layout analysis, text lines are represented by oriented rectangles. Even though, the method was designed for modern handwritten and printed documents, tests on medieval manuscripts give promising results. Additionally, the text line detection was evaluated on the ICDAR 2009 and ICFHR 2010 Handwriting Segmentation Contest datasets.
Keywords :
document image processing; geometry; text detection; ICDAR 2009; ICFHR 2010 handwriting segmentation contest datasets; OCR; automated document analysis; degraded documents; document structure analysis; handwritten documents; heterogeneous documents; layout analysis; medieval manuscripts; mixed layouts; oriented rectangles; text line detection; torn documents; word spotting; Databases; Frequency modulation; Image segmentation; Layout; Noise; Text analysis; Writing; Document Analysis; Layout Analysis; Text Line Detection;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.152