Title :
UNI3 - efficient algorithm for mining unordered induced subtrees using TMG candidate generation
Author :
Hadzic, Fedja ; Tan, Henry ; Dillon, Tharam S.
Author_Institution :
Fac. of Inf. Technol., Univ. of Technol. Sydney, NSW
fDate :
March 1 2007-April 5 2007
Abstract :
Semi-structured data sources are increasingly in use today because of their capability of representing information through more complex structures where semantics and relationships of data objects are more easily expressed. Extraction of frequent sub-structures from such data has found important applications in areas such as Bioinformatics, XML mining, Web mining, scientific data management etc. This paper is concerned with the task of mining frequent unordered induced subtrees from a database of rooted ordered labeled subtrees. Our previous work in the area of frequent subtree mining is characterized by the efficient tree model guided (TMG) candidate enumeration, where candidate subtrees conform to the data´s underlying tree structure. We apply the same approach to the unordered case, motivated by the fact that in many applications of frequent subtree mining the order among siblings is not considered important. The proposed UNI3 algorithm considers both transaction based and occurrence match support. Synthetic and real world data are used to evaluate the time performance of our approach in comparison to the well known algorithms developed for the same problem
Keywords :
data mining; tree data structures; UNI3 algorithm; data objects; frequent substructure extraction; information representation; semistructured data sources; tree isomorphism; tree model guided candidate generation; unordered induced subtree mining; Australia; Bioinformatics; Character generation; Computational intelligence; Data mining; Databases; Electronic mail; Information technology; Tree data structures; XML; canonical form; frequent subtree mining; induced unordered subtrees; tree isomorphism;
Conference_Titel :
Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0705-2
DOI :
10.1109/CIDM.2007.368926