Abstract :
This paper describes the alarm correlation in communication networks based on data mining. A direct application of sequential algorithms to distributed databases is not effective, because it requires a large amount of communication overhead. In our study, an efficient algorithm, EDMA, is proposed. It minimizes the number of candidate sets and exchange messages by local and global pruning. In local sites, it runs the application based on the improved algorithm-CMatrix, which is used to calculate local support counts. By numbering the global frequent itemsets generated at the end of k-th iteration from 1 to m, the algorithm codes every candidate (k+l)-itemset into a pair of those number formed as-(x,y) to compress the context transmitted and query corresponding support counts in CMatrix. Our solution also reduces the size of average transactions and datasets that leads to reduction of scan time. The performance study shows that EDMA has superior running efficiency, lower communication cost and stronger scalability than direct application of a sequential algorithm in distributed databases.
Keywords :
data mining; distributed databases; matrix algebra; message passing; CMatrix algorithm; EDMA algorithm; algorithm codes; association rule mining; communication network; distributed database; global pruning; local pruning; message exchange; Association rules; Communication networks; Costs; Data engineering; Data mining; Distributed databases; Itemsets; Knowledge engineering; Paper technology; Partitioning algorithms;