• DocumentCode
    172569
  • Title

    Building an Indonesian rule-based part-of-speech tagger

  • Author

    Rashel, Fam ; Luthfi, Andry ; Dinakaramani, Arawinda ; Manurung, Ruli

  • Author_Institution
    Fac. of Comput. Sci., Univ. Indonesia, Depok, Indonesia
  • fYear
    2014
  • fDate
    20-22 Oct. 2014
  • Firstpage
    70
  • Lastpage
    73
  • Abstract
    This paper describes work on a part-of-speech tagger for the Indonesian language by employing a rule-based approach. The system tokenizes documents while also considering multi-word expressions and recognizes named entities. It then applies tags to every token, starting from closed-class words to open-class words and disambiguates the tags based on a set of manually defined rules. The system currently obtains an accuracy of 79% on a manually tagged corpus of roughly 250.000 tokens.
  • Keywords
    knowledge based systems; natural language processing; Indonesian language; Indonesian rule-based part-of-speech tagger; closed-class words; multiword expression; named entity recognition; open-class words; rule-based approach; Accuracy; Buildings; Dictionaries; Natural language processing; Probabilistic logic; Speech; Tagging; disambiguation rule; part of speech tag; token;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing (IALP), 2014 International Conference on
  • Conference_Location
    Kuching
  • Type

    conf

  • DOI
    10.1109/IALP.2014.6973521
  • Filename
    6973521