• DocumentCode
    679793
  • Title

    Binarization and its evaluation for Urdu Nastalique document images

  • Author

    Naz, Mamoona ; ul Ain Akram, Qurat ; Hussain, Shiraz

  • Author_Institution
    Center for Language Eng., Al-Khawarizmi Inst. of Comput. Sci. Univ. of Eng. & Technol., Lahore, Pakistan
  • fYear
    2013
  • fDate
    19-20 Dec. 2013
  • Firstpage
    213
  • Lastpage
    218
  • Abstract
    Binarization converts a colored or gray scale image into a black and white image and is normally a preliminary step in optical character recognition. Binarization of images of Urdu language documents written in Nastalique writing style requires particular attention because Nastalique is not written with a uniform stroke but as a sequence of thin and thick strokes with a variety of marks. In the current work, three binarization methods are compared to determine an accurate and efficient technique for Urdu. This technique is further tuned for binarizing Urdu document images written in Nastalique writing style, to avoid disconnecting thin character connections but also to simultaneously prevent joining of diacritics with main bodies due to thickened strokes.
  • Keywords
    document image processing; optical character recognition; Nastalique writing style; Urdu Nastalique document images; Urdu language documents; binarization methods; colored image; gray scale image; optical character recognition; Accuracy; Character recognition; Lighting; Optical character recognition software; Optical imaging; Standards; Writing; Urdu Optical Character Recognition; Urdu image corpus; binarization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multi Topic Conference (INMIC), 2013 16th International
  • Conference_Location
    Lahore
  • Type

    conf

  • DOI
    10.1109/INMIC.2013.6731352
  • Filename
    6731352