مرکز منطقه ای اطلاع رساني علوم و فناوري - On Finding Similar Items in a Stream of Transactions

DocumentCode :

2191002

Title :

On Finding Similar Items in a Stream of Transactions

Author :

Campagna, Andrea ; Pagh, Rasmus

Author_Institution :

IT, Univ. of Copenhagen, Copenhagen, Denmark

fYear :

2010

fDate :

13-13 Dec. 2010

Firstpage :

121

Lastpage :

128

Abstract :

While there has been a lot of work on finding frequent item sets in transaction data streams, none of these solve the problem of finding similar pairs according to standard similarity measures. This paper is a first attempt at dealing with this, arguably more important, problem. We start out with a negative result that also explains the lack of theoretical upper bounds on the space usage of data mining algorithms for finding frequent item sets: Any algorithm that (even only approximately and with a chance of error) finds the most frequent k-item set must use space Ω(min{mb, n^k, (mb/φ)^k}) bits, where mb is the number of items in the stream so far, n is the number of distinct items and phi is a support threshold. To achieve any non-trivial space upper bound we must thus abandon a worst-case assumption on the data stream. We work under the model that the transactions come in random order, and show that surprisingly, not only is small-space similarity mining possible for the most common similarity measures, but the mining accuracy improves with the length of the stream for any fixed support threshold.

Keywords :

data mining; sampling methods; set theory; transaction processing; association rule; data mining; transaction data streaming; algorithms; association rules; data mining; sampling; streaming;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Mining Workshops (ICDMW), 2010 IEEE International Conference on

Conference_Location :

Sydney, NSW

Print_ISBN :

978-1-4244-9244-2

Electronic_ISBN :

978-0-7695-4257-7

Type :

conf

DOI :

10.1109/ICDMW.2010.152

Filename :

5693291

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2191002