Title :
Conversion of urdu nastaliq to roman urdu using OCR
Author :
Iqbal, Faiza ; Latif, Aisha ; Kanwal, Nazia ; Altaf, Tayyaba
Author_Institution :
Nat. Univ. of Sci. & Technol., Islamabad, Pakistan
Abstract :
This paper deals with Urdu Nastaliq, which is a popular script of writing Urdu language. The complex and cursive nature of nastaliq makes optical character recognition for this script very challenging. Segmentation of urdu nastaliq is also very complex due to different levels and shapes of characters according to their position in a word. Character based segmentation technique for urdu names has been proposed in this paper. After segmenting each character, the characters are matched to their templates already saved in the database. The proposed technique handles many complexities in conversion of urdu names to roman urdu including vowel sounds produced by a single character due to diacritics and different voices of ye, ain and other such characters. At the end names written in nastaliq script are converted into roman urdu.
Keywords :
natural language processing; optical character recognition; OCR; Urdu language; character based segmentation technique; complex nature; cursive nature; optical character recognition; roman urdu; urdu names; urdu nastaliq conversion;
Conference_Titel :
Interaction Sciences (ICIS), 2011 4th International Conference on
Conference_Location :
Busan
Print_ISBN :
978-1-4577-0480-2
Electronic_ISBN :
978-89-88678-45-9