Title :
A trainable, single-pass algorithm for column segmentation
Author :
Sylwester, Don ; Seth, Sharad
Author_Institution :
Dept. of Comput. Sci., Concordia Coll., Seward, NE, USA
Abstract :
Column segmentation logically precedes OCR in the document analysis process. The trainable algorithm XYCUT relies on horizontal and vertical binary profiles to produce an XY-tree representing the column structure of a page of a technical document in a single pass through the bit image. Training against ground truth adjusts a single, resolution independent, parameter using only local information and guided by an edit distance function. The algorithm correctly segments the page image for a (fairly) wide range of parameter values, although small, local and repairable errors may be made, an effect measured by a repair cost function
Keywords :
document image processing; errors; image representation; image segmentation; learning (artificial intelligence); optical character recognition; technical presentation; OCR; XY-tree; XYCUT; column segmentation; document analysis; edit distance function; errors; ground truth; horizontal profiles; image representation; page image segmentation; page structure; repair cost function; resolution independent parameter; technical document; trainable single-pass algorithm; vertical binary profiles; Algorithm design and analysis; Computer science; Cost function; Educational institutions; Image segmentation; Optical character recognition software; Pixel; Robustness; Size measurement; Text analysis;
Conference_Titel :
Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
Conference_Location :
Montreal, Que.
Print_ISBN :
0-8186-7128-9
DOI :
10.1109/ICDAR.1995.601971