DocumentCode
2016464
Title
On Segmentation of Documents in Complex Scripts
Author
Kumar, K. S Sesh ; Kumar, Sukesh ; Jawahar, C.V.
Author_Institution
Int. Inst. of Inf. Technol., Hyderabad
Volume
2
fYear
2007
fDate
23-26 Sept. 2007
Firstpage
1243
Lastpage
1247
Abstract
Document image segmentation algorithms primarily aim at separating text and graphics in presence of complex layouts. However, for many non-Latin scripts, segmentation becomes a challenge due to the characteristics of the script. In this paper, we empirically demonstrate that successful algorithms for Latin scripts may not be very effective for Indic and complex scripts. We explain this based on the differences in the spatial distribution of symbols in the scripts. We argue that the visual information used for segmentation needs to be enhanced with other information like script models for accurate results.
Keywords
computer graphics; document image processing; image segmentation; natural language processing; text analysis; Indic scripts; complex layouts; complex scripts; document image segmentation algorithms; document segmentation; graphics; non-Latin scripts; spatial distribution; text; Algorithm design and analysis; Data mining; Failure analysis; Graphics; Image segmentation; Image texture analysis; Information technology; Layout; Shape measurement; Text analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location
Parana
ISSN
1520-5363
Print_ISBN
978-0-7695-2822-9
Type
conf
DOI
10.1109/ICDAR.2007.4377114
Filename
4377114
Link To Document