• DocumentCode
    2146457
  • Title

    Design considerations for developing a parts-of-speech tagset for Khasi

  • Author

    Tham, Medari Janai

  • Author_Institution
    Dept. of Comput. Sci., St. Anthony´´s Coll., Shillong, India
  • fYear
    2012
  • fDate
    30-31 March 2012
  • Firstpage
    277
  • Lastpage
    280
  • Abstract
    Several tagsets have been developed for Indian languages belonging to the Indo-Aryan and Dravidian families. This is because the major chunk of India´s spoken language belongs to these categories. Khasi, on the other hand, belongs to the Austro-Asiatic family and is spoken primarily in the state of Meghalaya. To the best of my knowledge, language technology for Khasi is practically nonexistent and work on computational linguistic for the language is very scant. This proves to be a challenge when an attempt is made to provide access to technology using language when the basic tools needed are not available. There exists a common Part of Speech Tagset framework for Indian languages (IL-POSTS) covering the morphologically rich Indian languages under the Indo-Aryan and Dravidian families. However, in this paper the EAGLES guidelines are used for developing the Khasi tagset due to the natural infinity of the language to English. This is obvious from the script used, which is the Roman script and the word order is also primarily SVO.
  • Keywords
    computational linguistics; natural language processing; speech processing; Austro-Asiatic family; Dravidian families; EAGLES guidelines; English; IL-POSTS; Indian languages; Indo-Aryan families; Khasi; Meghalaya; NLP; Roman script; computational linguistic; design considerations; language technology; natural language processing; parts-of-speech tagset framework; spoken language; word order; Mood;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Emerging Trends and Applications in Computer Science (NCETACS), 2012 3rd National Conference on
  • Conference_Location
    Shillong
  • Print_ISBN
    978-1-4577-0749-0
  • Type

    conf

  • DOI
    10.1109/NCETACS.2012.6203274
  • Filename
    6203274