DocumentCode
172569
Title
Building an Indonesian rule-based part-of-speech tagger
Author
Rashel, Fam ; Luthfi, Andry ; Dinakaramani, Arawinda ; Manurung, Ruli
Author_Institution
Fac. of Comput. Sci., Univ. Indonesia, Depok, Indonesia
fYear
2014
fDate
20-22 Oct. 2014
Firstpage
70
Lastpage
73
Abstract
This paper describes work on a part-of-speech tagger for the Indonesian language by employing a rule-based approach. The system tokenizes documents while also considering multi-word expressions and recognizes named entities. It then applies tags to every token, starting from closed-class words to open-class words and disambiguates the tags based on a set of manually defined rules. The system currently obtains an accuracy of 79% on a manually tagged corpus of roughly 250.000 tokens.
Keywords
knowledge based systems; natural language processing; Indonesian language; Indonesian rule-based part-of-speech tagger; closed-class words; multiword expression; named entity recognition; open-class words; rule-based approach; Accuracy; Buildings; Dictionaries; Natural language processing; Probabilistic logic; Speech; Tagging; disambiguation rule; part of speech tag; token;
fLanguage
English
Publisher
ieee
Conference_Titel
Asian Language Processing (IALP), 2014 International Conference on
Conference_Location
Kuching
Type
conf
DOI
10.1109/IALP.2014.6973521
Filename
6973521
Link To Document