Title :
Binarization and its evaluation for Urdu Nastalique document images
Author :
Naz, Mamoona ; ul Ain Akram, Qurat ; Hussain, Shiraz
Author_Institution :
Center for Language Eng., Al-Khawarizmi Inst. of Comput. Sci. Univ. of Eng. & Technol., Lahore, Pakistan
Abstract :
Binarization converts a colored or gray scale image into a black and white image and is normally a preliminary step in optical character recognition. Binarization of images of Urdu language documents written in Nastalique writing style requires particular attention because Nastalique is not written with a uniform stroke but as a sequence of thin and thick strokes with a variety of marks. In the current work, three binarization methods are compared to determine an accurate and efficient technique for Urdu. This technique is further tuned for binarizing Urdu document images written in Nastalique writing style, to avoid disconnecting thin character connections but also to simultaneously prevent joining of diacritics with main bodies due to thickened strokes.
Keywords :
document image processing; optical character recognition; Nastalique writing style; Urdu Nastalique document images; Urdu language documents; binarization methods; colored image; gray scale image; optical character recognition; Accuracy; Character recognition; Lighting; Optical character recognition software; Optical imaging; Standards; Writing; Urdu Optical Character Recognition; Urdu image corpus; binarization;
Conference_Titel :
Multi Topic Conference (INMIC), 2013 16th International
Conference_Location :
Lahore
DOI :
10.1109/INMIC.2013.6731352