DocumentCode
551906
Title
Conversion of urdu nastaliq to roman urdu using OCR
Author
Iqbal, Faiza ; Latif, Aisha ; Kanwal, Nazia ; Altaf, Tayyaba
Author_Institution
Nat. Univ. of Sci. & Technol., Islamabad, Pakistan
fYear
2011
fDate
16-18 Aug. 2011
Firstpage
19
Lastpage
22
Abstract
This paper deals with Urdu Nastaliq, which is a popular script of writing Urdu language. The complex and cursive nature of nastaliq makes optical character recognition for this script very challenging. Segmentation of urdu nastaliq is also very complex due to different levels and shapes of characters according to their position in a word. Character based segmentation technique for urdu names has been proposed in this paper. After segmenting each character, the characters are matched to their templates already saved in the database. The proposed technique handles many complexities in conversion of urdu names to roman urdu including vowel sounds produced by a single character due to diacritics and different voices of ye, ain and other such characters. At the end names written in nastaliq script are converted into roman urdu.
Keywords
natural language processing; optical character recognition; OCR; Urdu language; character based segmentation technique; complex nature; cursive nature; optical character recognition; roman urdu; urdu names; urdu nastaliq conversion;
fLanguage
English
Publisher
ieee
Conference_Titel
Interaction Sciences (ICIS), 2011 4th International Conference on
Conference_Location
Busan
Print_ISBN
978-1-4577-0480-2
Electronic_ISBN
978-89-88678-45-9
Type
conf
Filename
6014524
Link To Document