DocumentCode
146558
Title
Frequent item set generation based on transaction hashing
Author
Agarwal, Jatin ; Singh, Ashutosh
Author_Institution
AITEM, Amity Univ., Noida, India
fYear
2014
fDate
25-26 Sept. 2014
Firstpage
182
Lastpage
187
Abstract
Hashing & Pruning is very popular association rule mining technique to improve the performance of traditional Apriori algorithm. Hashing technique uses hash function to reduce the size of candidate item set. Direct Hashing & Pruning (DHP), Perfect Hashing &Pruning (PHP) are the basic hashing algorithms. Many algorithms have been also proposed by researchers. All algorithms have their own pros and cons. DHP algorithm suffer from collision and require more database scans to count the frequency of collided item sets. PHP algorithm eliminates collision problem but this algorithm increases the size of hash table which requires large amount of memory space and uses complex hash function. The main objective of this paper is to reduce the number of collision, database scans to count the frequency of collided item sets and to make sure that the size of hash table does not increase. A new algorithm Transaction Hashing and Pruning (THP) is proposed in this paper. THP arranges the item sets into vertical format and after finding out the bucket number of candidate-k item sets, and hashes the transaction id (TID) of that the candidate item set into that bucket. THP algorithm overcomes the item set collision problem of DHP algorithm and large hash table problem of PHP algorithm. Experimental results are also shown in the paper.
Keywords
data mining; file organisation; transaction processing; DHP algorithm; PHP algorithm; THP algorithm; TID; association rule mining technique; candidate item set size reduction; candidate-k item sets; complex hash function; direct hashing & pruning algorithm; frequent item set generation; hash function; hash table; memory space function; perfect hashing & pruning algorithm; transaction hashing and pruning algorithm; transaction id; Association rules; Clustering algorithms; Databases; Information technology; Memory management; Next generation networking; Association Rules; Data Mining; Hashing and Pruning; Transaction Hashing;
fLanguage
English
Publisher
ieee
Conference_Titel
Confluence The Next Generation Information Technology Summit (Confluence), 2014 5th International Conference -
Conference_Location
Noida
Print_ISBN
978-1-4799-4237-4
Type
conf
DOI
10.1109/CONFLUENCE.2014.6949340
Filename
6949340
Link To Document