Title :
Text Line Segmentation of Historical Arabic Documents
Author :
Zahour, Abderrazak ; Likforman-Sulem, Laurence ; Boussalaa, W. ; Taconet, Bruno
Author_Institution :
Univ. du Havre/GED, Le Havre
Abstract :
This paper presents a text line segmentation method for printed or handwritten historical Arabic documents. Documents are first classified into 2 classes using a K-means scheme. These classes correspond to document complexity (easy or not easy to segment). Then, a document which includes overlapping and touching characters, is divided into vertical strips. The extracted text blocks obtained by horizontal projection are classified into three categories: small, average and large text blocks. After segmenting the large text blocks, the lines are obtained by matching adjacent blocks within two successive strips using spatial relationship. The document without overlapping or touching characters is segmented by making abstraction on the segmentation module of the large text blocks. The text line segmentation method has a 96% accuracy on a collection of 100 historical documents
Keywords :
image matching; image segmentation; natural language processing; text analysis; K-means scheme; handwritten historical Arabic document; printed historical Arabic document; spatial relationship; text block extraction; text line segmentation; Character recognition; Engines; Handwriting recognition; Image converters; Image segmentation; Level set; Strips; Testing; Text analysis;
Conference_Titel :
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location :
Parana
Print_ISBN :
978-0-7695-2822-9
DOI :
10.1109/ICDAR.2007.4378691