• DocumentCode
    3731566
  • Title

    A Semi-supervised Algorithm for Indonesian Named Entity Recognition

  • Author

    Rezka Aufar Leonandya;Bayu Distiawan;Nursidik Heru Praptono

  • Author_Institution
    Fac. of Comput. Sci., Univ. Indonesia, Depok, Indonesia
  • fYear
    2015
  • Firstpage
    45
  • Lastpage
    50
  • Abstract
    Named Entity Recognition or NER is one of the sub-research field of Information Extraction which can be used for machine translation, question answering, semantic web, etc. One of the biggest challenge of NER is the adversity to construct a manually labeled training data. In this work, we present a semi-supervised approach for Indonesian language NER which is capable of creating high quality training data automatically. Semi-supervised approach works by utilizing unlabeled data made from Wikipedia and DBPedia to form high accuracy and non-redundant additional training data for each iteration of semi-supervised process. We show that our system manages to generate new training data and gain an increasing F1 score as the iteration of semi-supervised process goes.
  • Keywords
    "Encyclopedias","Electronic publishing","Internet","Training data","Classification algorithms","Testing"
  • Publisher
    ieee
  • Conference_Titel
    Computational and Business Intelligence (ISCBI), 2015 3rd International Symposium on
  • Type

    conf

  • DOI
    10.1109/ISCBI.2015.15
  • Filename
    7383535