DocumentCode :
591989
Title :
An Unconstrained Benchmark Urdu Handwritten Sentence Database with Automatic Line Segmentation
Author :
Raza, Arif ; Siddiqi, Imran ; Abidi, Abdessalem ; Arif, Fahim
Author_Institution :
Nat. Univ. of Sci. & Technol., Islamabad, Pakistan
fYear :
2012
fDate :
18-20 Sept. 2012
Firstpage :
491
Lastpage :
496
Abstract :
In this paper we present and announce a novel off-line sentence database of Urdu handwritten documents along with a few preprocessing and text line segmentation procedures. Despite an increased research interest in Urdu handwritten document analysis over the recent years, a standard benchmark dataset, which could be used in Urdu handwriting recognition tasks, has been missing. Based on our own developed and updated corpus named CENIP-UCCP (Center for Image Processing-Urdu Corpus Construction Project), we have developed an Urdu handwritten database. The corpus is a collection of a variety of Urdu texts that were used to generate forms. These forms were subsequently filled by native writers in their natural handwritings. Six categories of text were used to generate these forms with each category using approximately 66 forms. Up till now, the database comprises 400 digitized forms produced by 200 different writers. The database is completely labeled for content information as well as content detection and supports the evaluation of systems like Urdu handwriting recognition, line segmentation and writer identification. The database was also experimented with the proposed Urdu text line segmentation scheme rendering promising segmentation results.
Keywords :
document image processing; handwritten character recognition; image segmentation; natural language processing; text analysis; CENIP-UCCP; Center for Image Processing-Urdu Corpus Construction Project; Urdu handwriting recognition task; Urdu handwritten database; Urdu handwritten document analysis; Urdu text line segmentation; automatic line segmentation; content detection; content information; natural handwriting; off-line sentence database; unconstrained benchmark Urdu handwritten sentence database; writer identification; Benchmark testing; Databases; Handwriting recognition; Image segmentation; Labeling; Text analysis; Writing; Handwriting recognition; Urdu; corpus; database;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Frontiers in Handwriting Recognition (ICFHR), 2012 International Conference on
Conference_Location :
Bari
Print_ISBN :
978-1-4673-2262-1
Type :
conf
DOI :
10.1109/ICFHR.2012.177
Filename :
6424443
Link To Document :
بازگشت