Title :
Text line extraction for historical document images using steerable directional filters
Author :
Alaql, Omar ; Cheng Chang Lu
Author_Institution :
Dept. of Comput. Sci., Kent State Univ. Kent, Kent, OH, USA
Abstract :
Vast amounts of valuable historical documents exist in libraries and in various National Archives that have not been exploited electronically. The analysis of historical documents presents specific difficulties with respect to other types of handwritten documents. Because of the low quality and the complexity of these documents, the document analysis remains an open research field. One of the major processes to analyze these documents is automatic text line extraction, which influences the accuracy of text recognition. The Center for Unified Biometrics and Sensors (CUBS) proposed one of the best-known approaches for text line extraction. In this paper, and starting with the concepts of CUBS approach, we propose an approach to extract text lines from the historical document images. The proposed approach is based on three local connectivity maps. One has the orientation angles of the text lines, and it is generated by using a dynamic steerable directional filter. This map is modified by using a mode filter to determine the paragraph map in the documents. Based on the values of the paragraph map, the adaptive local connectivity map (ALCM) is generated by using a static steerable directional filter to estimate the location of the text line. The proposed approach solves the problem of the ALCM binarization that the CUBS approach has, and gives the advantage of extracting the paragraphs in the document besides the text lines segmentation.
Keywords :
biometrics (access control); document image processing; feature extraction; history; libraries; records management; text analysis; ALCM; CUBS; Center for Unified Biometrics and Sensors; adaptive local connectivity map; document analysis; handwritten documents; historical document images; libraries; national archives; paragraph map; static steerable directional filter; text line extraction; text recognition; Educational institutions; Filtering algorithms; Image segmentation; Kernel; Level set; Libraries; Optical filters; adaptive local connectivity map (ALCM); local connectivity directions map (LCDM); paragraph map;
Conference_Titel :
Audio, Language and Image Processing (ICALIP), 2014 International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4799-3902-2
DOI :
10.1109/ICALIP.2014.7009807