مرکز منطقه ای اطلاع رساني علوم و فناوري - Using a hash-based method with transaction trimming for mining association rules

DocumentCode :

1331308

Title :

Using a hash-based method with transaction trimming for mining association rules

Author :

Park, Jong Soo ; Chen, Ming-Syan ; Yu, Philip S.

Author_Institution :

Dept. of Comput. Sci., Sungshin Women´´s Univ., Seoul, South Korea

Volume :

Issue :

fYear :

1997

Firstpage :

813

Lastpage :

825

Abstract :

We examine the issue of mining association rules among items in a large database of sales transactions. Mining association rules means that, given a database of sales transactions, to discover all associations among items such that the presence of some items in a transaction will imply the presence of other items in the same transaction. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items that appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first, and then, identifying, within this candidate set, these itemsets that meet the large itemset requirement. Generally, this is done iteratively for each large k-itemset in increasing order of k, where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate sets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we develop an effective algorithm for the candidate set generation. It is a hash-based algorithm and is especially effective for the generation of a candidate set for large 2-itemsets. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to effectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly. The advantage of the proposed algorithm also provides us the opportunity of reducing the amount of disk I/O required. An extensive simulation study is conducted to evaluate performance of the proposed algorithm

Keywords :

database theory; deductive databases; file organisation; knowledge acquisition; marketing; marketing data processing; sales management; software performance evaluation; transaction processing; very large databases; association rule mining; candidate set generation; computational cost; data mining performance; disk input-output; hash-based method; item association; large database; large itemsets; performance bottleneck; sales transactions; simulation study; transaction database size; transaction trimming; Association rules; Computational efficiency; Computational modeling; Credit cards; Data mining; Itemsets; Marketing and sales; Mining industry; Performance analysis; Transaction databases;

fLanguage :

English

Journal_Title :

Knowledge and Data Engineering, IEEE Transactions on

Publisher :

ieee

ISSN :

1041-4347

Type :

jour

DOI :

10.1109/69.634757

Filename :

634757

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1331308