• DocumentCode
    1909250
  • Title

    Recognizing Biomedical Named Entities in the Absence of Human Annotated Corpora

  • Author

    Gu, Baohua ; Dahl, Veronica ; Popowich, Fred

  • Author_Institution
    Simon Fraser Univ. Burnaby, Burnaby
  • fYear
    2007
  • fDate
    Aug. 30 2007-Sept. 1 2007
  • Firstpage
    74
  • Lastpage
    81
  • Abstract
    Biomedical named entity recognition is an important task in biomedical text mining. Currently the dominant approach is supervised learning, which requires a sufficiently large human annotated corpus for training. In this paper, we propose a novel approach aimed at minimizing the annotation requirement. The idea is to use a dictionary which is essentially a list of entity names compiled by domain experts and sometimes more readily available than domain experts themselves. Given an unlabelled training corpus, we label the sentences by a simple dictionary lookup, which provides us with highly reliable but incomplete positive data. We then run a SVM-based self-training process in the spirit of semi-supervised learning to iteratively learn from the positive and unlabelled data to build a reliable classifier. Our evaluation on the BioNLP-2004 shared task data sets suggests that the proposed method can be a feasible alternative to traditional approaches when human annotation is not available.
  • Keywords
    character recognition; classification; data mining; learning (artificial intelligence); medical computing; support vector machines; biomedical named entities recognition; biomedical text mining; dictionary lookup; human annotated corpora; self-training process; semisupervised learning; support vector machines; Abstracts; Dictionaries; Humans; Proteins; Semisupervised learning; Supervised learning; Support vector machine classification; Support vector machines; Target recognition; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-1610-3
  • Electronic_ISBN
    978-1-4244-1611-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2007.4368014
  • Filename
    4368014