DocumentCode :
2923143
Title :
Multi-Criterion Active Learning in Conditional Random Fields
Author :
Symons, Christopher T. ; Samatova, Nagiza F. ; Krishnamurthy, Ramya ; Park, Byung H. ; Umar, Tarik ; Buttler, David ; Critchlow, Terence ; Hysom, David
Author_Institution :
Oak Ridge Nat. Lab., TN
fYear :
2006
fDate :
Nov. 2006
Firstpage :
323
Lastpage :
331
Abstract :
Conditional random fields (CRFs), which are popular supervised learning models for many natural language processing (NLP) tasks, typically require a large collection of labeled data for training. In practice, however, manual annotation of text documents is quite costly. Furthermore, even large labeled training sets can have arbitrarily limited performance peaks if they are not chosen with care. This paper considers the use of multi-criterion active learning for identification of a small but sufficient set of text samples for training CRFs. Our empirical results demonstrate that our method is capable of reducing the manual annotation costs, while also limiting the retraining costs that are often associated with active learning. In addition, we show that the generalization performance of CRFs can be enhanced through judicious selection of training examples
Keywords :
generalisation (artificial intelligence); learning (artificial intelligence); natural language processing; random processes; text analysis; conditional random fields; manual annotation; multicriterion active learning; natural language processing; supervised learning models; text documents; training sets; Computational efficiency; Costs; Labeling; Laboratories; Learning systems; Markov random fields; Natural language processing; Supervised learning; Tagging; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence, 2006. ICTAI '06. 18th IEEE International Conference on
Conference_Location :
Arlington, VA
ISSN :
1082-3409
Print_ISBN :
0-7695-2728-0
Type :
conf
DOI :
10.1109/ICTAI.2006.90
Filename :
4031915
Link To Document :
بازگشت