DocumentCode :
1864659
Title :
Rule Based Part of Speech Tagging of Sindhi Language
Author :
Mahar, Javed Ahmed ; Memon, Ghulam Qadir
Author_Institution :
Dept. of Comput. Sci., Shah Abdul Latif Univ., Khairpur, Pakistan
fYear :
2010
fDate :
9-10 Feb. 2010
Firstpage :
101
Lastpage :
106
Abstract :
Part of speech (POS) tagging is a process of assigning correct syntactic categories to each word in the text. Tag set and word disambiguation rules are fundamental parts of any POS tagger. No work has hitherto been published of tag set in Sindhi language. The Sindhi lexicon for computational processing is also not available. In this study, the tag set for Sindhi POS, lexicon and word disambiguation rules are designed and developed. The Sindhi corpus is collected from a comprehensive Sindhi dictionary. The corpus is based on the most recent available vocabulary used by local people. In this paper, preliminary achievements of rule based Sindhi part of speech (SPOS) tagger are presented. Tagging and tokenization algorithms are also designed for the implementation of SPOS. The outputs of SPOS are verified by Sindhi linguist. The development of SPOS tagger may have an important milestone towards computational Sindhi language processing.
Keywords :
identification technology; speech processing; word processing; Sindhi language; Sindhi lexicon; Sindhi part of speech tagger; computational Sindhi language processing; syntactic categories; word disambiguation rules; Algorithm design and analysis; Computer science; Dictionaries; Morphology; Natural languages; Signal processing; Speech processing; Speech recognition; Tagging; Vocabulary; Lexicon; Morphology; Part of Speech; Sindhi; Tagging Rules;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Acquisition and Processing, 2010. ICSAP '10. International Conference on
Conference_Location :
Bangalore
Print_ISBN :
978-1-4244-5724-3
Electronic_ISBN :
978-1-4244-5725-0
Type :
conf
DOI :
10.1109/ICSAP.2010.27
Filename :
5432667
Link To Document :
بازگشت