Title :
Online Algorithms for Complete Itemset Counts Using Set-to-String Mappings
Author :
Jawad, Ahmed ; Karim, Asim ; Khan, Lmdadullah
Author_Institution :
Dept. of Comput. Sci., Lahore Univ. of Manage. Sci.
Abstract :
We present two algorithms for maintaining the exact counts of all itemsets over a stream of transactions. The count of each subset in a transaction is maintained by mapping it to substrings of the alphabet. This technique allows efficient time processing of the items in a single pass over the data stream. The two algorithms differ in their mapping schemes and data structures. The first algorithm performs prefix-based-scan of each transaction taken as a Boolean string, while the second algorithm improves on the first by exploiting the capability of suffix tree like structure for online enumeration of transaction suffixes. Correctness proofs and theoretic bounds on time and space complexity of these algorithms are presented. The algorithms are implemented and evaluated on several synthetic datasets. The results confirm that these algorithms are well suited to association rule mining over data streams for many practical business applications.
Keywords :
computational complexity; data mining; transaction processing; tree data structures; Boolean string; association rule mining; business applications; complete itemset counts; online algorithms; online enumeration; prefix-based-scan technique; set-to-string mappings; space complexity; suffix tree like structure; synthetic datasets; time complexity; transaction suffixes; Algorithm design and analysis; Computer science; Data analysis; Data mining; Data structures; Failure analysis; Frequency; Itemsets; Pattern analysis; Transaction databases;
Conference_Titel :
Multitopic Conference, 2006. INMIC '06. IEEE
Conference_Location :
Islamabad
Print_ISBN :
1-4244-0795-8
Electronic_ISBN :
1-4244-0795-8
DOI :
10.1109/INMIC.2006.358185