• DocumentCode
    2813413
  • Title

    Semi-supervised learning for named entity recognition using weakly labeled training data

  • Author

    Zafarian, Atefeh ; Rokni, Ali ; Khadivi, Shahram ; Ghiasifard, Sonia

  • Author_Institution
    Dept. of Comput. Eng. & IT, Amirkabir Univ. of Technol., Tehran, Iran
  • fYear
    2015
  • fDate
    3-5 March 2015
  • Firstpage
    129
  • Lastpage
    135
  • Abstract
    The shortage of the annotated training data is still an important challenge to building many Natural Language Process (NLP) tasks such as Named Entity Recognition. NER requires a large amount of training data with a high degree of human supervision whereas there is not enough labeled data for every language. In this paper, we use an unlabeled bilingual corpora to extract useful features from transferring information from resource-rich language toward resource-poor language and by using these features and a small training data, make a NER supervised model. Then we utilize a graph-based semi-supervised learning method that trains a CRF-based supervised classifier using that labeled data and uses high-confidence predictions on the unlabeled data to expand the training set and improve efficiency of NER model with the new training set.
  • Keywords
    feature extraction; graph theory; learning (artificial intelligence); natural language processing; pattern classification; CRF-based supervised classifier; NER supervised model; NLP; annotated training data; feature extraction; graph-based semisupervised learning method; named entity recognition; natural language processing; unlabeled bilingual corpora; weakly labeled training data; Computational modeling; Data models; Feature extraction; Organizations; Semisupervised learning; Training; Training data; Bilingual parallel corpora; Named entity Recognition; graph-based semi-supervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Artificial Intelligence and Signal Processing (AISP), 2015 International Symposium on
  • Conference_Location
    Mashhad
  • Print_ISBN
    978-1-4799-8817-4
  • Type

    conf

  • DOI
    10.1109/AISP.2015.7123504
  • Filename
    7123504