DocumentCode :
3495695
Title :
Extraction of gene regulatory networks from biological literature
Author :
Tangirala, Karthik ; Caragea, Doina
Author_Institution :
Dept. of Comput. & Inf. Sci., Kansas Sate Univ., Manhattan, KS, USA
fYear :
2013
fDate :
12-14 June 2013
Firstpage :
1
Lastpage :
6
Abstract :
A gene regulatory network (GRN) is a network of interacting cellular components. The components are genes and their products, and the interactions represent regulatory relationships among genes, specifically activation and inhibition of gene expression, under certain conditions. Many regulatory relationships are known in the literature. However, assembling isolated relationships into networks is a challenging, albeit important task. We have developed a system for automatically extracting GRNs from the literature. As a first filtering step, our system makes use of Textpresso (an existing ontology-based system for information retrieval from scientific literature) to identify document sentences that contain genes of interest and regulatory relationships involving those genes. The ontology-annotated Textpresso search results, provided in XML format, are examined for regular expressions that specify regulatory relationships. We use a set of positively labeled relations between genes (equivalently, pairs of genes that exhibit a regulatory relationship) to infer scores for patterns present in such relations. The pattern scores are further used to compute total scores for putative relationships between pairs of genes present in a sentence. Pairs with total score greater than a threshold are assumed to be positive. Availability of very little amounts of labeled relationships along with large amounts of unlabeled data motivated us to study the performance of our approach also in a semi-supervised setting. We show that unlabeled data can, in some cases, improve the performance of our approach.
Keywords :
XML; biology computing; cellular biophysics; feature extraction; filtering theory; genetics; information filtering; learning (artificial intelligence); ontologies (artificial intelligence); pattern classification; XML format; biological literature; cellular component interaction; document sentences; filtering step; gene expression activation; gene expression inhibition; gene regulatory network extraction; genes-of-interest; information retrieval; isolated relationship assembly; ontology-annotated Textpresso search results; ontology-based system; pattern scores; regulatory relationships; scientific literature; semisupervised setting; unlabeled data motivation; Data mining; Data models; Dictionaries; Proteins; Training; Gene regulatory networks; natural language processing; ontology; regular expressions; semi-supervised learning; supervised learning; text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Advances in Bio and Medical Sciences (ICCABS), 2013 IEEE 3rd International Conference on
Conference_Location :
New Orleans, LA
Type :
conf
DOI :
10.1109/ICCABS.2013.6629200
Filename :
6629200
Link To Document :
بازگشت