DocumentCode
3104837
Title
delta-Tolerance Closed Frequent Itemsets
Author
Cheng, James ; Ke, Yiping ; Ng, Wilfred
Author_Institution
Dept. of Comput. Sci. & Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong
fYear
2006
fDate
18-22 Dec. 2006
Firstpage
139
Lastpage
148
Abstract
In this paper, we study an inherent problem of mining frequent itemsets (FIs): the number of FIs mined is often too large. The large number of FIs not only affects the mining performance, but also severely thwarts the application of FI mining. In the literature, Closed FIs (CFIs) and Maximal FIs (MFIs) are proposed as concise representations of FIs. However, the number of CFIs is still too large in many cases, while MFIs lose information about the frequency of the FIs. To address this problem, we relax the restrictive definition of CFIs and propose the (delta-Tolerance CFIs delta- TCFIs). Mining delta-TCFIs recursively removes all subsets of a delta-TCFI that fall within a frequency distance bounded by delta. We propose two algorithms, CFI2TCFI and MineTCFI, to mine delta-TCFIs. CFI2TCFI achieves very high accuracy on the estimated frequency of the recovered FIs but is less efficient when the number of CFIs is large, since it is based on CFI mining. MineTCFI is significantly faster and consumes less memory than the algorithms of the state-of-the-art concise representations of FIs, while the accuracy of MineTCFI is only slightly lower than that of CFI2TCFI.
Keywords
data mining; closed frequent itemset mining; delta-tolerance; maximal frequent itemset mining; Association rules; Computer science; Data analysis; Data mining; Error analysis; Frequency estimation; Indexing; Itemsets; Pattern analysis; Transaction databases;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2006. ICDM '06. Sixth International Conference on
Conference_Location
Hong Kong
ISSN
1550-4786
Print_ISBN
0-7695-2701-7
Type
conf
DOI
10.1109/ICDM.2006.1
Filename
4053042
Link To Document