Title :
Array-Tree: A persistent data structure to compactly store frequent itemsets
Author :
Baralis, Elena ; Cerquitelli, Tania ; Chiusano, Silvia ; Grand, Alberto
Author_Institution :
Dipt. di Autom. e Inf., Politec. di Torino, Torino, Italy
Abstract :
Frequent itemset mining discovers correlations among data items in a transactional dataset. A huge amount of itemsets is often extracted, which is usually hard to process and analyze. The efficient management of the extracted frequent itemsets is still an open research issue. This paper presents a new persistent structure, the Array-Tree, that compactly stores frequent itemsets. It is an array-based structure exploiting both prefix-path sharing and subtree sharing to reduce data replication in the tree, thus increasing its compactness. The Array-Tree can be profitably exploited to efficiently query extracted itemsets by enforcing user-defined item or support constraints. Experiments performed on real and synthetic datasets show both the compactness of the Array-Tree data representation and its efficient support to user queries.
Keywords :
arrays; constraint handling; data mining; tree data structures; array tree data representation; data reduce; frequent itemset mining; persistent data structure; prefix path sharing; subtree sharing; synthetic dataset; Costs; Data mining; Data structures; Information retrieval; Itemsets; Performance evaluation; Query processing; Refining; Tree data structures;
Conference_Titel :
Intelligent Systems (IS), 2010 5th IEEE International Conference
Conference_Location :
London
Print_ISBN :
978-1-4244-5163-0
Electronic_ISBN :
978-1-4244-5164-7
DOI :
10.1109/IS.2010.5548388