DocumentCode :
2360653
Title :
New parallel algorithms for frequent itemset mining in very large databases
Author :
Veloso, Adriano ; Meira, Wagner, Jr. ; Parthasarathy, Srinivasan
Author_Institution :
Comput. Sci. Dept., Univ. Fed. de Minas Gerais, Belo Horizonte, Brazil
fYear :
2003
fDate :
10-12 Nov. 2003
Firstpage :
158
Lastpage :
166
Abstract :
Frequent itemset mining is a classic problem in data mining. It is a nonsupervised process which concerns in finding frequent patterns (or itemsets) hidden in large volumes of data in order to produce compact summaries or models of the database. These models are typically used to generate association rules, but recently they have also been used in far reaching domains like e-commerce and bio-informatics. Because databases are increasing in terms of both dimension (number of attributes) and size (number of records), one of the main issues in a frequent itemset mining algorithm is the ability to analyze very large databases. Sequential algorithms do not have this ability, especially in terms of run-time performance, for such very large databases. Therefore, we must rely on high performance parallel and distributed computing. We present new parallel algorithms for frequent itemset mining. Their efficiency is proven through a series of experiments on different parallel environments, that range from shared-memory multiprocessors machines to a set of SMP clusters connected together through a high speed network. We also briefly discuss an application of our algorithms to the analysis of large databases collected by a Brazilian Web portal.
Keywords :
data mining; parallel algorithms; shared memory systems; very large databases; Brazilian Web portal; SMP clusters; data mining; database model; distributed computing; frequent itemset mining; parallel algorithms; shared-memory multiprocessors machines; very large databases; Algorithm design and analysis; Association rules; Data analysis; Data mining; Databases; Distributed computing; High-speed networks; Itemsets; Parallel algorithms; Runtime;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Architecture and High Performance Computing, 2003. Proceedings. 15th Symposium on
Print_ISBN :
0-7695-2046-4
Type :
conf
DOI :
10.1109/CAHPC.2003.1250334
Filename :
1250334
Link To Document :
بازگشت