DocumentCode :
2081619
Title :
Parallel association rule mining with minimum inter-processor communication
Author :
El-Hajj, Mohammad ; Zaïane, Osmar R.
Author_Institution :
Dept. of Comput. Sci., Alberta Edmonton Univ., Canada
fYear :
2003
fDate :
1-5 Sept. 2003
Firstpage :
519
Lastpage :
523
Abstract :
Existing parallel association rule mining algorithms suffer from many problems when mining massive transactional datasets. One major problem is that most of the parallel algorithms for a shared nothing environment are Apriori-based algorithms. Apriori-based algorithms are proven to be not scalable due to many reasons, mainly: (1) the repetitive I/O disk scans, (2) the huge computation and communication involved during the candidacy generation. This paper proposes a new disk-based parallel association rule mining algorithm called Inverted Matrix, which achieves its efficiency by applying three new ideas. First, transactional data is converted into a new database layout called Inverted Matrix that prevents multiple scanning of the database during the mining phase, in which finding globally frequent patterns could be achieved in less than a full scan with random access. This data structure is replicated among the parallel nodes. Second, for each frequent item assigned to a parallel node, a relatively small independent tree is built summarizing co-occurrences. Finally, a simple and non-recursive mining process reduces the memory requirements as minimum candidacy generation and counting is needed, and no communication between nodes is required to generate all globally frequent patterns.
Keywords :
data mining; data structures; decision trees; multiprocessor interconnection networks; parallel algorithms; parallel databases; pattern clustering; replicated databases; Inverted Matrix; apriori based algorithms; candidacy generation; cooccurrence summarization; data structure replication; database layout; disk-based rule mining; independent tree; interprocessor communication; inverted matrix; memory requirement; multiple scanning; parallel algorithm; parallel association rule mining; parallel node; parallel nodes; pattern finding; repetitive I/O disk scans; transactional datasets; Association rules; Business communication; Concurrent computing; Costs; Data mining; Data structures; Explosives; Matrix converters; Parallel algorithms; Transaction databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database and Expert Systems Applications, 2003. Proceedings. 14th International Workshop on
ISSN :
1529-4188
Print_ISBN :
0-7695-1993-8
Type :
conf
DOI :
10.1109/DEXA.2003.1232075
Filename :
1232075
Link To Document :
بازگشت