DocumentCode :
844580
Title :
Data Mining and Predictive Modeling of Biomolecular Network from Biomedical Literature Databases
Author :
Hu, Xiaohua ; Wu, Daniel D.
Author_Institution :
Coll. of Information Sci. & Tech., Drexel Univ., Philadelphia, PA
Volume :
4
Issue :
2
fYear :
2007
Firstpage :
251
Lastpage :
263
Abstract :
In this paper, we present a novel approach Bio-IEDM (biomedical information extraction and data mining) to integrate text mining and predictive modeling to analyze biomolecular network from biomedical literature databases. Our method consists of two phases. In phase 1, we discuss a semisupervised efficient learning approach to automatically extract biological relationships such as protein-protein interaction, protein-gene interaction from the biomedical literature databases to construct the biomolecular network. Our method automatically learns the patterns based on a few user seed tuples and then extracts new tuples from the biomedical literature based on the discovered patterns. The derived biomolecular network forms a large scale-free network graph. In phase 2, we present a novel clustering algorithm to analyze the biomolecular network graph to identify biologically meaningful subnetworks (communities). The clustering algorithm considers the characteristics of the scale-free network graphs and is based on the local density of the vertex and its neighborhood functions that can be used to find more meaningful clusters with different density level. The experimental results indicate our approach is very effective in extracting biological knowledge from a huge collection of biomedical literature. The integration of data mining and information extraction provides a promising direction for analyzing the biomolecular network
Keywords :
biochemistry; data mining; genetics; graphs; learning (artificial intelligence); medical information systems; molecular biophysics; prediction theory; statistical analysis; Bio-IEDM; biomedical information extraction; biomedical literature databases; biomolecular network; clustering algorithm; data mining; large scale-free network graph; predictive modeling; protein-gene interaction; protein-protein interaction; semisupervised efficient learning; text mining; Bioinformatics; Biological cells; Biology computing; Clustering algorithms; Data mining; Databases; Information analysis; Predictive models; Proteins; Text mining; Biomolecular network; biological complexes (communities).; information extraction; scale-free network; semisupervised learning; Algorithms; Artificial Intelligence; Computer Simulation; Data Interpretation, Statistical; Database Management Systems; Databases, Protein; Gene Expression; Information Storage and Retrieval; Models, Biological; Natural Language Processing; Periodicals as Topic; Protein Interaction Mapping; Proteome; Signal Transduction;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2007.070211
Filename :
4196536
Link To Document :
بازگشت