مرکز منطقه ای اطلاع رساني علوم و فناوري - ProtChew: Automatic Extraction of Protein Names from Biomedical Literature

DocumentCode :

2158179

Title :

ProtChew: Automatic Extraction of Protein Names from Biomedical Literature

Author :

Tveit, Amund ; Sætre, Rune ; Lægreid, Astrid ; Steigedal, Tonje Strømmen

Author_Institution :

NTNU, Norway

fYear :

2005

fDate :

05-08 April 2005

Firstpage :

1161

Lastpage :

1161

Abstract :

With the increasing amount of biomedical literature, there is a need for automatic extraction of information to support biomedical researchers. Due to incomplete biomedical information databases, the extraction is not straightforward using dictionaries, and several approaches using contextual rules and machine learning have previously been proposed. Our work is inspired by the previous approaches, but is novel in the sense that it is fully automatic and doesn’t rely on expert tagged corpora. The main ideas are 1) unigram tagging of corpora using known protein names for training examples for the protein name extraction classi- fier and 2) tight positive and negative examples by having protein-related words as negative examples and protein names/synonyms as positive examples. We present preliminary results on Medline abstracts about gastrin, further work will be on testing the approach on BioCreative benchmark data sets.

Keywords :

Abstracts; Biological materials; Biomedical computing; Biomedical materials; Cancer; Data mining; Databases; Information science; Protein engineering; Training data;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Engineering Workshops, 2005. 21st International Conference on

Print_ISBN :

0-7695-2657-8

Type :

conf

DOI :

10.1109/ICDE.2005.268

Filename :

1647764

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2158179