Title :
Graph and Topological Structure Mining on Scientific Articles
Author :
Wang, Fan ; Jin, Ruoming ; Agrawal, Gagan ; Piontkivska, Helen
Author_Institution :
Ohio State Univ., Columbus
Abstract :
In this paper, we investigate a new approach for literature mining. We use frequent subgraph mining, and its generalization topological structure mining, for finding interesting relationships between gene names and other key biological terms from the text of scientific articles. We show how we can find keywords of interest and represent them as nodes of the graphs. We also propose several methods for inserting edges between these nodes. Our study initially focused on comparing: 1) different methods for constructing edges, and 2) patterns found from sub-graph mining and topological structure mining. Subsequently, we analyzed several frequent topological minors reported by our experiments, and explained their scientific significance. Overall, our study shows the following. First, a simple method of constructing edges, which is based on sliding windows, seems to provide the best results. Second, we are able to find much larger number of well-known and meaningful topological patterns with high support values, as compared to sub-graphs. Overall, the frequent topological minors our algorithm found correspond well to known relationships between genes and biological terms. Thus, we believe that topological structure mining can be a very valuable tool for researchers who are not deeply familiar with the existing literature, and want to obtain a quick summary about known relationships among key scientific names or terms.
Keywords :
arrays; biology computing; data mining; edge detection; genetics; graph theory; molecular biophysics; biological terms; edge constructing methods; gene microarrays; gene names; literature mining; scientific articles; sliding window method; subgraph mining; topological minors; topological patterns; topological structure mining; Biology; Computer science; Data mining; Diseases; Pattern matching; Proteins; Sequences; Social network services;
Conference_Titel :
Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4244-1509-0
DOI :
10.1109/BIBE.2007.4375739