• DocumentCode
    172564
  • Title

    Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus

  • Author

    Dinakaramani, Arawinda ; Rashel, Fam ; Luthfi, Andry ; Manurung, Ruli

  • Author_Institution
    Fac. of Comput. Sci., Univ. Indonesia, Depok, Indonesia
  • fYear
    2014
  • fDate
    20-22 Oct. 2014
  • Firstpage
    66
  • Lastpage
    69
  • Abstract
    We describe our work on designing a linguistically principled part of speech (POS) tagset for the Indonesian language. The process involves a detailed study and analysis of existing tagsets and the manual tagging of an Indonesian corpus. The results of this work are an Indonesian POS tagset consisting of 23 tags and an Indonesian corpus of over 250.000 lexical tokens that have been manually tagged using this tagset.
  • Keywords
    natural language processing; Indonesian POS tagset; Indonesian part-of-speech tagset; POS tagset; linguistically principled part-of-speech; manually tagged Indonesian corpus; Conferences; Context; Manuals; Pragmatics; Speech; Syntactics; Tagging; Indonesian; POS; Part of speech tagset;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing (IALP), 2014 International Conference on
  • Conference_Location
    Kuching
  • Type

    conf

  • DOI
    10.1109/IALP.2014.6973519
  • Filename
    6973519