DocumentCode :
3256102
Title :
Pattern Learning through Distant Supervision for Extraction of Protein-Residue Associations in the Biomedical Literature
Author :
Ravikumar, K.E. ; Liu, Haibin ; Cohn, Judith D. ; Wall, Michael E. ; Verspoor, Karin
Author_Institution :
Sch. of Med., Univ. of Colorado, Aurora, CO, USA
Volume :
2
fYear :
2011
fDate :
18-21 Dec. 2011
Firstpage :
59
Lastpage :
65
Abstract :
We propose a method enabling automatic extraction of protein-specific residues from the biomedical literature. We aim to associate mentions of specific amino acids to the protein of which the residue forms a part. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patterns for identifying the amino acid residue mentions in text. Further, we applied an automated graph-based method to learn syntactic and semantic patterns corresponding to protein-residue pairs mentioned in the text. On a new automatically generated data set of high confidence protein-residue relationship sentences, established through distant supervision, the method achieved a F-measure of 0.78. This work will pave the way to improved extraction of protein functional residues from the literature.
Keywords :
biology computing; data mining; graph theory; learning (artificial intelligence); proteins; F-measure; amino acid residue; automated graph-based method; automatic extraction; automatically generated data set; biomedical literature; distant supervision; high confidence protein-residue relationship sentences; linguistic patterns; pattern learning; protein function prediction; protein functional residues extraction; protein functional site extraction; protein-residue associations extraction; protein-residue pairs; protein-specific residues; semantic patterns; specific amino acids; syntactic patterns; Abstracts; Amino acids; Data mining; Gold; Protein engineering; Proteins; Silver; Mutation mining; distant supervision; information extraction; pattern learning; protein residue mining; text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
978-1-4577-2134-2
Type :
conf
DOI :
10.1109/ICMLA.2011.112
Filename :
6147049
Link To Document :
بازگشت