Title :
Recognizing Biomedical Named Entities in the Absence of Human Annotated Corpora
Author :
Gu, Baohua ; Dahl, Veronica ; Popowich, Fred
Author_Institution :
Simon Fraser Univ. Burnaby, Burnaby
fDate :
Aug. 30 2007-Sept. 1 2007
Abstract :
Biomedical named entity recognition is an important task in biomedical text mining. Currently the dominant approach is supervised learning, which requires a sufficiently large human annotated corpus for training. In this paper, we propose a novel approach aimed at minimizing the annotation requirement. The idea is to use a dictionary which is essentially a list of entity names compiled by domain experts and sometimes more readily available than domain experts themselves. Given an unlabelled training corpus, we label the sentences by a simple dictionary lookup, which provides us with highly reliable but incomplete positive data. We then run a SVM-based self-training process in the spirit of semi-supervised learning to iteratively learn from the positive and unlabelled data to build a reliable classifier. Our evaluation on the BioNLP-2004 shared task data sets suggests that the proposed method can be a feasible alternative to traditional approaches when human annotation is not available.
Keywords :
character recognition; classification; data mining; learning (artificial intelligence); medical computing; support vector machines; biomedical named entities recognition; biomedical text mining; dictionary lookup; human annotated corpora; self-training process; semisupervised learning; support vector machines; Abstracts; Dictionaries; Humans; Proteins; Semisupervised learning; Supervised learning; Support vector machine classification; Support vector machines; Target recognition; Text recognition;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-1610-3
Electronic_ISBN :
978-1-4244-1611-0
DOI :
10.1109/NLPKE.2007.4368014