Title :
A Graph Mining Algorithm for Classifying Chemical Compounds
Author :
Lam, Winnie W M ; Chan, Keith C C
Author_Institution :
Dept. of Comput., Hong Kong Polytech. Univ., Hong Kong
Abstract :
Graph data mining algorithms are increasingly applied to biological graph dataset. However, while existing graph mining algorithms can identify frequently occurring sub-graphs, these do not necessarily represent useful patterns. In this paper, we propose a novel graph mining algorithm, MIGDAC (Mining Graph DAta for Classification), that applies graph theory and an interestingness measure to discover interesting sub-graphs which can be both characterized and easily distinguished from other classes. Applying MIGDAC to the discovery of specific patterns of chemical compounds, we first represent each chemical compound as a graph and transform it into a set of hierarchical graphs. This not only represents more information that traditional formats, it also simplifies the complex graph structures. We then apply MIGDAC to extract a set of class-specific patterns defined in terms of an interestingness threshold and measure with residue analysis. The next step is to use weight of evidence to estimate whether the identified class-specific pattern will positively or negatively characterize a class of drug. Experiments on a drug dataset from the KEGG ligand database show that MIGDAC using hierarchical graph representation greatly improves the accuracy of the traditional frequent graph mining algorithms.
Keywords :
biology computing; data mining; drugs; graph theory; pattern classification; KEGG ligand database; MIGDAC; biological graph dataset; chemical compound classification; class-specific patterns; data mining; drug; graph mining algorithm; graph theory; hierarchical graphs; residue analysis; subgraphs; Bioinformatics; Biology computing; Biomedical computing; Biomedical measurements; Chemical compounds; Data mining; Databases; Drugs; Graph theory; Pattern analysis; Graph mining; chemical compounds; classificatio; frequent occuring sub-graphs; interestingness measure; weight of evidence;
Conference_Titel :
Bioinformatics and Biomedicine, 2008. BIBM '08. IEEE International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
978-0-7695-3452-7
DOI :
10.1109/BIBM.2008.36