DocumentCode
3043329
Title
Parallel mining of association rules from text databases on a cluster of workstations
Author
Holt, John D. ; Chung, Soon M.
Author_Institution
Dept. of Comput. Sci. & Eng., Wright State Univ., Dayton, OH, USA
fYear
2004
fDate
26-30 April 2004
Firstpage
86
Abstract
Summary form only given. We propose a new algorithm named Parallel Multipass with Inverted Hashing and Pruning (PMIHP) for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently because of the large number of itemsets (i.e., sets of words) that need to be counted. The new PMIHP algorithm is a parallel version of our multipass with inverted hashing and pruning (MIHP) algorithm, which was shown to be quite efficient than other existing algorithms in the context of mining text databases. The PMIHP algorithm reduces the overhead of communication between miners running on different processors because they are mining local databases asynchronously and prune the global candidates by using the inverted hashing and pruning technique.
Keywords
data mining; distributed databases; file organisation; parallel algorithms; text analysis; workstation clusters; Parallel Multipass with Inverted Hashing and Pruning algorithm; association rules; parallel mining; retail transaction databases; text databases; workstation clusters; Association rules; Clustering algorithms; Computer science; Context; Data engineering; Data mining; Itemsets; Linux; Transaction databases; Workstations;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International
Print_ISBN
0-7695-2132-0
Type
conf
DOI
10.1109/IPDPS.2004.1303027
Filename
1303027
Link To Document