DocumentCode
2146457
Title
Design considerations for developing a parts-of-speech tagset for Khasi
Author
Tham, Medari Janai
Author_Institution
Dept. of Comput. Sci., St. Anthony´´s Coll., Shillong, India
fYear
2012
fDate
30-31 March 2012
Firstpage
277
Lastpage
280
Abstract
Several tagsets have been developed for Indian languages belonging to the Indo-Aryan and Dravidian families. This is because the major chunk of India´s spoken language belongs to these categories. Khasi, on the other hand, belongs to the Austro-Asiatic family and is spoken primarily in the state of Meghalaya. To the best of my knowledge, language technology for Khasi is practically nonexistent and work on computational linguistic for the language is very scant. This proves to be a challenge when an attempt is made to provide access to technology using language when the basic tools needed are not available. There exists a common Part of Speech Tagset framework for Indian languages (IL-POSTS) covering the morphologically rich Indian languages under the Indo-Aryan and Dravidian families. However, in this paper the EAGLES guidelines are used for developing the Khasi tagset due to the natural infinity of the language to English. This is obvious from the script used, which is the Roman script and the word order is also primarily SVO.
Keywords
computational linguistics; natural language processing; speech processing; Austro-Asiatic family; Dravidian families; EAGLES guidelines; English; IL-POSTS; Indian languages; Indo-Aryan families; Khasi; Meghalaya; NLP; Roman script; computational linguistic; design considerations; language technology; natural language processing; parts-of-speech tagset framework; spoken language; word order; Mood;
fLanguage
English
Publisher
ieee
Conference_Titel
Emerging Trends and Applications in Computer Science (NCETACS), 2012 3rd National Conference on
Conference_Location
Shillong
Print_ISBN
978-1-4577-0749-0
Type
conf
DOI
10.1109/NCETACS.2012.6203274
Filename
6203274
Link To Document