DocumentCode :
146558
Title :
Frequent item set generation based on transaction hashing
Author :
Agarwal, Jatin ; Singh, Ashutosh
Author_Institution :
AITEM, Amity Univ., Noida, India
fYear :
2014
fDate :
25-26 Sept. 2014
Firstpage :
182
Lastpage :
187
Abstract :
Hashing & Pruning is very popular association rule mining technique to improve the performance of traditional Apriori algorithm. Hashing technique uses hash function to reduce the size of candidate item set. Direct Hashing & Pruning (DHP), Perfect Hashing &Pruning (PHP) are the basic hashing algorithms. Many algorithms have been also proposed by researchers. All algorithms have their own pros and cons. DHP algorithm suffer from collision and require more database scans to count the frequency of collided item sets. PHP algorithm eliminates collision problem but this algorithm increases the size of hash table which requires large amount of memory space and uses complex hash function. The main objective of this paper is to reduce the number of collision, database scans to count the frequency of collided item sets and to make sure that the size of hash table does not increase. A new algorithm Transaction Hashing and Pruning (THP) is proposed in this paper. THP arranges the item sets into vertical format and after finding out the bucket number of candidate-k item sets, and hashes the transaction id (TID) of that the candidate item set into that bucket. THP algorithm overcomes the item set collision problem of DHP algorithm and large hash table problem of PHP algorithm. Experimental results are also shown in the paper.
Keywords :
data mining; file organisation; transaction processing; DHP algorithm; PHP algorithm; THP algorithm; TID; association rule mining technique; candidate item set size reduction; candidate-k item sets; complex hash function; direct hashing & pruning algorithm; frequent item set generation; hash function; hash table; memory space function; perfect hashing & pruning algorithm; transaction hashing and pruning algorithm; transaction id; Association rules; Clustering algorithms; Databases; Information technology; Memory management; Next generation networking; Association Rules; Data Mining; Hashing and Pruning; Transaction Hashing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Confluence The Next Generation Information Technology Summit (Confluence), 2014 5th International Conference -
Conference_Location :
Noida
Print_ISBN :
978-1-4799-4237-4
Type :
conf
DOI :
10.1109/CONFLUENCE.2014.6949340
Filename :
6949340
Link To Document :
بازگشت