DocumentCode :
2548960
Title :
Finding Correlated Item Pairs through Efficient Pruning with a Given Threshold
Author :
Wang, Bo ; Su, Liang ; Li, Aiping ; Zou, Peng
Author_Institution :
Sch. of Comput., Nat. Univ. of Defense Technol., Changsha
fYear :
2008
fDate :
20-22 July 2008
Firstpage :
413
Lastpage :
420
Abstract :
Given a minimum threshold in a massive market-basket data set, an item pair whose correlation above the threshold is considered correlated. In this paper, we provide a randomized algorithm SERIT-a Searching-corrElated-pair Randomized algorithm for dIfferent Thresholds- to find all correlated pairs effectively, which adopts the Pearson´s correlation coefficient [11] as the measure criterion. In their CIKM´06 paper [2], Zhang et al. address the same problem by taking the relation of Pearson´s coefficient and Jaccard distance into account. However, it is inefficient when the threshold is small. We propose a new probability function to prune uncorrelated item pairs based on [2], which can cover the shortage of the former one. Experimental results with synthetic and real data sets reveal that with a given threshold, even if it is small, SERIT algorithm can prune the item pairs unwanted efficiently and save large computational resources.
Keywords :
correlation methods; data mining; probability; randomised algorithms; very large databases; Jaccard distance; Pearson correlation coefficient; SERIT algorithm; data mining; data pruning; massive market-basket data set; minimum threshold; probability function; searching-correlated-item-pair randomized algorithm; Association rules; Bioinformatics; Data mining; Data models; Information management; Itemsets; Power measurement; Public healthcare; Time measurement; Upper bound; Pearson´s coefficient; correlated item; min-hash function; statistical correlation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web-Age Information Management, 2008. WAIM '08. The Ninth International Conference on
Conference_Location :
Zhangjiajie Hunan
Print_ISBN :
978-0-7695-3185-4
Electronic_ISBN :
978-0-7695-3185-4
Type :
conf
DOI :
10.1109/WAIM.2008.84
Filename :
4597042
Link To Document :
بازگشت