• DocumentCode
    146558
  • Title

    Frequent item set generation based on transaction hashing

  • Author

    Agarwal, Jatin ; Singh, Ashutosh

  • Author_Institution
    AITEM, Amity Univ., Noida, India
  • fYear
    2014
  • fDate
    25-26 Sept. 2014
  • Firstpage
    182
  • Lastpage
    187
  • Abstract
    Hashing & Pruning is very popular association rule mining technique to improve the performance of traditional Apriori algorithm. Hashing technique uses hash function to reduce the size of candidate item set. Direct Hashing & Pruning (DHP), Perfect Hashing &Pruning (PHP) are the basic hashing algorithms. Many algorithms have been also proposed by researchers. All algorithms have their own pros and cons. DHP algorithm suffer from collision and require more database scans to count the frequency of collided item sets. PHP algorithm eliminates collision problem but this algorithm increases the size of hash table which requires large amount of memory space and uses complex hash function. The main objective of this paper is to reduce the number of collision, database scans to count the frequency of collided item sets and to make sure that the size of hash table does not increase. A new algorithm Transaction Hashing and Pruning (THP) is proposed in this paper. THP arranges the item sets into vertical format and after finding out the bucket number of candidate-k item sets, and hashes the transaction id (TID) of that the candidate item set into that bucket. THP algorithm overcomes the item set collision problem of DHP algorithm and large hash table problem of PHP algorithm. Experimental results are also shown in the paper.
  • Keywords
    data mining; file organisation; transaction processing; DHP algorithm; PHP algorithm; THP algorithm; TID; association rule mining technique; candidate item set size reduction; candidate-k item sets; complex hash function; direct hashing & pruning algorithm; frequent item set generation; hash function; hash table; memory space function; perfect hashing & pruning algorithm; transaction hashing and pruning algorithm; transaction id; Association rules; Clustering algorithms; Databases; Information technology; Memory management; Next generation networking; Association Rules; Data Mining; Hashing and Pruning; Transaction Hashing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Confluence The Next Generation Information Technology Summit (Confluence), 2014 5th International Conference -
  • Conference_Location
    Noida
  • Print_ISBN
    978-1-4799-4237-4
  • Type

    conf

  • DOI
    10.1109/CONFLUENCE.2014.6949340
  • Filename
    6949340