Title :
A modified recursive x-y cut algorithm for solving block ordering problems
Author :
Sutheebanjard, Phaisarn ; Premchaiswadi, Wichian
Author_Institution :
Grad. Sch. of Inf. Technol., Siam Univ., Bangkok, Thailand
Abstract :
To achieve the best results from an OCR system, the pre-processing steps must be performed with a high degree of accuracy and reliability. There are two critically important steps in the OCR pre-processing phase. First, blocks must be extracted from each page of the scanned document. Secondly, all blocks resulting from the first step must be arranged in the correct order. One of the most notable techniques for block ordering in the second step is the recursive x-y cut (RXYC) algorithm. This technique works accurately only when applied to documents with a simple page layout but it causes incorrect block ordering when applied to documents with complex page layouts. This paper proposes a modified recursive x-y cut algorithm for solving block ordering problems for documents with complex page layouts. This proposed algorithm can solve problems such as (1) the overlapping block problem; (2) the blocks overlay problem, and (3) the L-Shaped block problem.
Keywords :
document image processing; optical character recognition; L-Shaped block problem; OCR preprocessing phase; OCR system; block ordering problem; blocks overlay problem; complex page layouts; modified recursive x-y cut algorithm; overlapping block problem; scanned document; Character recognition; Costs; Data mining; Electronic publishing; Information technology; Optical character recognition software; Problem-solving; block ordering; recursive x-y cut;
Conference_Titel :
Computer Engineering and Technology (ICCET), 2010 2nd International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-6347-3
DOI :
10.1109/ICCET.2010.5485882