مرکز منطقه ای اطلاع رساني علوم و فناوري - Finding patterns in three-dimensional graphs: algorithms and applications to scientific data mining

DocumentCode :

787139

Title :

Finding patterns in three-dimensional graphs: algorithms and applications to scientific data mining

Author :

Xiong Wang ; Wang, Xiong ; Shasha, Dennis ; Shapiro, Bruce A. ; Rigoutsos, Isidore ; Zhang, Kaizhong

Author_Institution :

Dept. of Comput. Sci., California State Univ., Fullerton, CA, USA

Volume :

Issue :

fYear :

2002

Firstpage :

731

Lastpage :

749

Abstract :

Presents a method for finding patterns in 3D graphs. Each node in a graph is an undecomposable or atomic unit and has a label. Edges are links between the atomic units. Patterns are rigid substructures that may occur in a graph after allowing for an arbitrary number of whole-structure rotations and translations as well as a small number (specified by the user) of edit operations in the patterns or in the graph. (When a pattern appears in a graph only after the graph has been modified, we call that appearance "approximate occurrence.") The edit operations include relabeling a node, deleting a node and inserting a node. The proposed method is based on the geometric hashing technique, which hashes node-triplets of the graphs into a 3D table and compresses the label-triplets in the table. To demonstrate the utility of our algorithms, we discuss two applications of them in scientific data mining. First, we apply the method to locating frequently occurring motifs in two families of proteins pertaining to RNA-directed DNA polymerase and thymidylate synthase and use the motifs to classify the proteins. Then, we apply the method to clustering chemical compounds pertaining to aromatic compounds, bicyclicalkanes and photosynthesis. Experimental results indicate the good performance of our algorithms and high recall and precision rates for both classification and clustering

Keywords :

DNA; biology computing; data mining; graph theory; organic compounds; pattern classification; pattern clustering; proteins; relevance feedback; scientific information systems; 3D graph pattern finding; RNA-directed DNA polymerase; approximate occurrence; aromatic compounds; bicyclicalkanes; biochemistry; chemical compound clustering; edit operations; frequently occurring motifs; geometric hashing technique; graph edges; knowledge discovery; medicine; node deletion; node insertion; node labels; node relabeling; node triplets; photosynthesis; precision; protein classification; proteins; recall; rigid substructures; scientific data mining; structural pattern discovery; table label triplet compression; thymidylate synthase; undecomposable nodes; whole-structure rotations; whole-structure translations; Application software; Biochemistry; Biomedical imaging; Chemical compounds; Clustering algorithms; DNA; Data mining; Image coding; Polymers; Proteins;

fLanguage :

English

Journal_Title :

Knowledge and Data Engineering, IEEE Transactions on

Publisher :

ieee

ISSN :

1041-4347

Type :

jour

DOI :

10.1109/TKDE.2002.1019211

Filename :

1019211

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=787139