Title :
GPM: A graph pattern matching kernel with diffusion for chemical compound classification
Author :
Smalter, Aaron ; Huan, Jun ; Lushington, Gerald
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of Kansas, Lawrence, KS
Abstract :
Classifying chemical compounds is an active topic in drug design and other cheminformatics applications. Graphs are general tools for organizing information from heterogeneous sources and have been applied in modelling many kinds of biological data. With the fast accumulation of chemical structure data, building highly accurate predictive models for chemical graphs emerges as a new challenge . In this paper, we demonstrate a novel technique called Graph Pattern Matching kernel (GPM). Our idea is to leverage existing frequent pattern discovery methods and explore their application to kernel classifiers (e.g. support vector machine) for graph classification. In our method, we first identify all frequent patterns from a graph database. We then map subgraphs to graphs in the database and use a diffusion process to label nodes in the graphs. Finally the kernel is computed using a set matching algorithm. We performed experiments on 16 chemical structure data sets and have compared our methods to other major graph kernels. The experimental results demonstrate excellent performance of our method.
Keywords :
bioinformatics; drugs; graph theory; pattern matching; GPM; chemical compound classification; cheminformatics; diffusion; drug design; graph pattern matching kernel; Biological system modeling; Buildings; Chemical compounds; Databases; Drugs; Kernel; Organizing; Pattern matching; Predictive models; Support vector machines;
Conference_Titel :
BioInformatics and BioEngineering, 2008. BIBE 2008. 8th IEEE International Conference on
Conference_Location :
Athens
Print_ISBN :
978-1-4244-2844-1
Electronic_ISBN :
978-1-4244-2845-8
DOI :
10.1109/BIBE.2008.4696654