DocumentCode :
3269013
Title :
Unordered tree mining with applications to phylogeny
Author :
Shasha, Dennis ; Wang, Jason T L ; Zhang, Sen
Author_Institution :
Courant Inst. of Math. Sci., New York Univ., NY, USA
fYear :
2004
fDate :
30 March-2 April 2004
Firstpage :
708
Lastpage :
719
Abstract :
Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data, such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. We present a new FSM technique for finding patterns in rooted unordered labeled trees. The patterns of interest are cousin pairs in these trees. A cousin pair is a pair of nodes sharing the same parent, the same grandparent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(|T|2) time where |T| is the number of nodes in T. Experimental results on synthetic data and phylogenies show the scalability and effectiveness of the proposed technique. To demonstrate the usefulness of our approach, we discuss its applications to locating co-occurring patterns in multiple evolutionary trees, evaluating the consensus of equally parsimonious trees, and finding kernel trees of groups of phylogenies. We also describe extensions of our algorithms for undirected acyclic graphs (or free trees).
Keywords :
data mining; graph theory; pattern recognition; tree data structures; Web log analysis; XML processing; bioinformatics; co-occurring pattern; cousin pair; free tree; frequent structure mining; kernel tree; multiple evolutionary tree; pattern discovery; pattern extraction; phylogeny; rooted unordered labeled trees; structural data; undirected acyclic graph; unordered tree mining; Bioinformatics; Data mining; Educational institutions; History; Kernel; Organisms; Phylogeny; Scalability; Tree graphs; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2004. Proceedings. 20th International Conference on
ISSN :
1063-6382
Print_ISBN :
0-7695-2065-0
Type :
conf
DOI :
10.1109/ICDE.2004.1320039
Filename :
1320039
Link To Document :
بازگشت