• DocumentCode
    551906
  • Title

    Conversion of urdu nastaliq to roman urdu using OCR

  • Author

    Iqbal, Faiza ; Latif, Aisha ; Kanwal, Nazia ; Altaf, Tayyaba

  • Author_Institution
    Nat. Univ. of Sci. & Technol., Islamabad, Pakistan
  • fYear
    2011
  • fDate
    16-18 Aug. 2011
  • Firstpage
    19
  • Lastpage
    22
  • Abstract
    This paper deals with Urdu Nastaliq, which is a popular script of writing Urdu language. The complex and cursive nature of nastaliq makes optical character recognition for this script very challenging. Segmentation of urdu nastaliq is also very complex due to different levels and shapes of characters according to their position in a word. Character based segmentation technique for urdu names has been proposed in this paper. After segmenting each character, the characters are matched to their templates already saved in the database. The proposed technique handles many complexities in conversion of urdu names to roman urdu including vowel sounds produced by a single character due to diacritics and different voices of ye, ain and other such characters. At the end names written in nastaliq script are converted into roman urdu.
  • Keywords
    natural language processing; optical character recognition; OCR; Urdu language; character based segmentation technique; complex nature; cursive nature; optical character recognition; roman urdu; urdu names; urdu nastaliq conversion;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Interaction Sciences (ICIS), 2011 4th International Conference on
  • Conference_Location
    Busan
  • Print_ISBN
    978-1-4577-0480-2
  • Electronic_ISBN
    978-89-88678-45-9
  • Type

    conf

  • Filename
    6014524