Title :
Document layout analysis and reading order determination for a reading robot
Author :
Pan, Yucun ; Zhao, Qunfei ; Kamata, Seiichiro
Author_Institution :
Sch. of Electron., Inf. & Electr. Eng., Shanghai Jiao Tong Univ., Shanghai, China
Abstract :
In this paper an efficient approach of document layout analysis and reading order determination is proposed for a reading robot. Firstly the input document images are preprocessed to remove noises, connect lines and domains, and to reduce the computation time. Secondly a bottom-up, parameter-independent, two-step layout analysis algorithm based on morphology is used, which outlines the geometry of the maximum homogeneous regions and classifies them into texts, tables, and pictures. Finally the reading order is determined, by a top-down recursive hierarchy algorithm derived from XY-cut, using a set of rules depending on layout information. Important parameters are acquired using statistic information of the given images to adapt to different types of documents. The proposed algorithm is applied to a large number of document images and the experimental results show that it makes the reading robot be able to read paper documents of different languages, even with complex layout structure.
Keywords :
document image processing; optical character recognition; robot vision; XY-cut; computation time reduction; document images; document layout analysis; layout information; reading order determination; reading robot; statistic information; top-down recursive hierarchy algorithm; two step layout analysis algorithm; a reading robot; adaptive; hierarchy; layout analysis; morphology based; reading order determination;
Conference_Titel :
TENCON 2010 - 2010 IEEE Region 10 Conference
Conference_Location :
Fukuoka
Print_ISBN :
978-1-4244-6889-8
DOI :
10.1109/TENCON.2010.5686038