• DocumentCode
    2571874
  • Title

    CRF-based active learning for Chinese named entity recognition

  • Author

    Yao, Lin ; Sun, Chengjie ; Li, Shaofeng ; Wang, Xiaolong ; Wang, Xuan

  • Author_Institution
    Comput. Sci. Dept., HITSGS, Shenzhen, China
  • fYear
    2009
  • fDate
    11-14 Oct. 2009
  • Firstpage
    1557
  • Lastpage
    1561
  • Abstract
    Conditional random fields (CRFs) have been used for many sequence labeling tasks and got excellent results. Further, the supervised model strongly depends on the huge training data. Active learning is a different way rather than relying on a large amount random sampling. However, random sampling constructively participates in the optimal choosing training examples. Based on different query strategies, active learning can combine with other machine learning methods to reduce the annotation cost while maintaining the accuracy. This paper proposes a new active learning strategy based on information density (ID) integrated with CRFs for Chinese named entity recognition (NER). On Sighan bakeoff 2006 MSRA NER corpus, an F1 score of 77.2% is achieved by using only 10,000 labeled training sentences chosen by the proposed active learning strategy.
  • Keywords
    graph theory; information retrieval; learning (artificial intelligence); natural language processing; CRF-based active learning strategy; Chinese named entity recognition; conditional random fields; information density; machine learning methods; query strategies; random sampling; supervised model; Computer science; Costs; Cybernetics; Hidden Markov models; Labeling; Learning systems; Machine learning; Sampling methods; Training data; USA Councils; active learning; conditional random field; information density; named entity recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on
  • Conference_Location
    San Antonio, TX
  • ISSN
    1062-922X
  • Print_ISBN
    978-1-4244-2793-2
  • Electronic_ISBN
    1062-922X
  • Type

    conf

  • DOI
    10.1109/ICSMC.2009.5346315
  • Filename
    5346315