Title :
Rule Based Part of Speech Tagging of Sindhi Language
Author :
Mahar, Javed Ahmed ; Memon, Ghulam Qadir
Author_Institution :
Dept. of Comput. Sci., Shah Abdul Latif Univ., Khairpur, Pakistan
Abstract :
Part of speech (POS) tagging is a process of assigning correct syntactic categories to each word in the text. Tag set and word disambiguation rules are fundamental parts of any POS tagger. No work has hitherto been published of tag set in Sindhi language. The Sindhi lexicon for computational processing is also not available. In this study, the tag set for Sindhi POS, lexicon and word disambiguation rules are designed and developed. The Sindhi corpus is collected from a comprehensive Sindhi dictionary. The corpus is based on the most recent available vocabulary used by local people. In this paper, preliminary achievements of rule based Sindhi part of speech (SPOS) tagger are presented. Tagging and tokenization algorithms are also designed for the implementation of SPOS. The outputs of SPOS are verified by Sindhi linguist. The development of SPOS tagger may have an important milestone towards computational Sindhi language processing.
Keywords :
identification technology; speech processing; word processing; Sindhi language; Sindhi lexicon; Sindhi part of speech tagger; computational Sindhi language processing; syntactic categories; word disambiguation rules; Algorithm design and analysis; Computer science; Dictionaries; Morphology; Natural languages; Signal processing; Speech processing; Speech recognition; Tagging; Vocabulary; Lexicon; Morphology; Part of Speech; Sindhi; Tagging Rules;
Conference_Titel :
Signal Acquisition and Processing, 2010. ICSAP '10. International Conference on
Conference_Location :
Bangalore
Print_ISBN :
978-1-4244-5724-3
Electronic_ISBN :
978-1-4244-5725-0
DOI :
10.1109/ICSAP.2010.27