Title :
Semi-Supervised Active Learning for Modeling Medical Concepts from Free Text
Author :
Rosales, Rómer ; Krishnamurthy, Praveen ; Rao, R. Bharat
Author_Institution :
IKM CKS Siemens Med. Solutions, Malvern
Abstract :
We apply a new active learning formulation to the problem of learning medical concepts from unstructured text. The new formulation is based on maximizing the mutual information that a sample labeling provides about the retrieval/classification model. This methodology is related to and extends the Query-by-Committee approach (QBC) (Seung et al., 1992) by exploiting unlabeled data in novel ways, beyond their common use only as potential query points. Unlike QBC, this method allows us to employ unlabeled data in addition to labeled data in order to select more appropriate samples for labeling. The samples thus chosen are both informative and also relevant according to a distribution of interest. This flexibility allows us to also tailor the model to arbitrary distributions relevant to the task at hand, in particular to the distribution of the test data. This formulation has implications in scenarios where the training and test distributions are different, or when a general model is adapted to a more specific model. Experiments were conducted to evaluate retrieval performance of natural-language text associated to various concepts of interest in the medical domain. We demonstrate the advantages of our formulation compared with QBC, the state-of-the art active learning approach, and against random sample selection.
Keywords :
information retrieval; learning (artificial intelligence); medical computing; text analysis; Query-by-Committee approach; active learning formulation; classification model; free text; medical concept learning; medical concepts; mutual information; natural-language text retrieval performance; query points; retrieval model; semisupervised active learning; test data distribution; unlabeled data; unstructured text; Application software; Art; Computer science; Information retrieval; Labeling; Machine learning; Mutual information; State feedback; Testing; Uncertainty;
Conference_Titel :
Machine Learning and Applications, 2007. ICMLA 2007. Sixth International Conference on
Conference_Location :
Cincinnati, OH
Print_ISBN :
978-0-7695-3069-7
DOI :
10.1109/ICMLA.2007.103