DocumentCode :
3684650
Title :
Predicting protein function from biomedical text
Author :
Kamal Taha;Paul D. Yoo
Author_Institution :
Electrical and Computer Engineering Department, Khalifa University, UAE
fYear :
2015
Firstpage :
3275
Lastpage :
3278
Abstract :
We propose a classifier system called PFPBT that predicts the functions of un-annotated proteins. PFPBT assigns an un-annotated protein p the functional category of annotated proteins that are semantically similar to p. Each protein p is represented by a vector of weights. Each weight reflects the significance of a molecule m in the biomedical abstracts associated with p. That is, each weight quantifies the likelihood of the association between m and p. This is because all proteins bind to other molecules, which are highly predictive of the functions of the proteins. Let S be the set of proteins that is semantically similar to an un-annotated protein p. p is annotated with the functional category f, if its occurrence probability in abstracts associated with S whose functional category is f is statistically significantly different than its occurrences in abstracts associated with S that belong to all other functional categories. PFPBT automatically extracts each co-occurrence of a protein-molecule pair that represents semantic relationship between the pair. We present novel semantic rules based on the syntactic structures of sentences for identifying the semantic relationships between each co-occurrence of a protein-molecule pair in a sentence. We evaluated PFPBT by comparing it experimentally with two systems. Results showed improvement.
Keywords :
"Proteins","Protein engineering","Semantics","Pragmatics","Syntactics","Feature extraction"
Publisher :
ieee
Conference_Titel :
Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE
ISSN :
1094-687X
Electronic_ISBN :
1558-4615
Type :
conf
DOI :
10.1109/EMBC.2015.7319091
Filename :
7319091
Link To Document :
بازگشت