DocumentCode
2571874
Title
CRF-based active learning for Chinese named entity recognition
Author
Yao, Lin ; Sun, Chengjie ; Li, Shaofeng ; Wang, Xiaolong ; Wang, Xuan
Author_Institution
Comput. Sci. Dept., HITSGS, Shenzhen, China
fYear
2009
fDate
11-14 Oct. 2009
Firstpage
1557
Lastpage
1561
Abstract
Conditional random fields (CRFs) have been used for many sequence labeling tasks and got excellent results. Further, the supervised model strongly depends on the huge training data. Active learning is a different way rather than relying on a large amount random sampling. However, random sampling constructively participates in the optimal choosing training examples. Based on different query strategies, active learning can combine with other machine learning methods to reduce the annotation cost while maintaining the accuracy. This paper proposes a new active learning strategy based on information density (ID) integrated with CRFs for Chinese named entity recognition (NER). On Sighan bakeoff 2006 MSRA NER corpus, an F1 score of 77.2% is achieved by using only 10,000 labeled training sentences chosen by the proposed active learning strategy.
Keywords
graph theory; information retrieval; learning (artificial intelligence); natural language processing; CRF-based active learning strategy; Chinese named entity recognition; conditional random fields; information density; machine learning methods; query strategies; random sampling; supervised model; Computer science; Costs; Cybernetics; Hidden Markov models; Labeling; Learning systems; Machine learning; Sampling methods; Training data; USA Councils; active learning; conditional random field; information density; named entity recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on
Conference_Location
San Antonio, TX
ISSN
1062-922X
Print_ISBN
978-1-4244-2793-2
Electronic_ISBN
1062-922X
Type
conf
DOI
10.1109/ICSMC.2009.5346315
Filename
5346315
Link To Document