DocumentCode :
1266549
Title :
GPD: A Graph Pattern Diffusion Kernel for Accurate Graph Classification with Applications in Cheminformatics
Author :
Smalter, Aaron ; Huan, Jun Luke ; Jia, Yi ; Lushington, Gerald
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of Kansas, Lawrence, KS, USA
Volume :
7
Issue :
2
fYear :
2010
Firstpage :
197
Lastpage :
207
Abstract :
Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogeneous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, building highly accurate predictive models for graph data emerges as a new challenge that has not been fully explored in the data mining community. In this paper, we demonstrate a novel technique called graph pattern diffusion (GPD) kernel. Our idea is to leverage existing frequent pattern discovery methods and to explore the application of kernel classifier (e.g., support vector machine) in building highly accurate graph classification. In our method, we first identify all frequent patterns from a graph database. We then map subgraphs to graphs in the graph database and use a process we call "pattern diffusion?? to label nodes in the graphs. Finally, we designed a graph alignment algorithm to compute the inner product of two graphs. We have tested our algorithm using a number of chemical structure data. The experimental results demonstrate that our method is significantly better than competing methods such as those kernel functions based on paths, cycles, and subgraphs.
Keywords :
biochemistry; bioinformatics; chemical structure; chemistry computing; data mining; diffusion; genetics; molecular biophysics; proteins; support vector machines; accurate graph classification; chemical structure data; cheminformatics; gene regulation networks; graph alignment algorithm; graph data mining; graph database; graph pattern diffusion kernel; kernel classifier; protein sequences; protein structures; support vector machine; Graph classification; frequent subgraph mining.; graph alignment; Algorithms; Animals; Computational Biology; Computer Simulation; Data Mining; Databases, Factual; Enzyme Inhibitors; Female; Humans; Intestinal Mucosa; Male; Mice; Molecular Conformation; Pattern Recognition, Automated; Pharmaceutical Preparations; Pharmacokinetics; Rats;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2009.80
Filename :
5313793
Link To Document :
بازگشت