• DocumentCode
    2489998
  • Title

    Scalable data mining with log based consistency DSM for high performance distributed computing

  • Author

    Hirayama, Hideaki ; Honda, Hiroki ; Yuba, Toshitsugu

  • Author_Institution
    Graduate Sch. of Inf. Syst., Univ. of Electro-Commun., Tokyo, Japan
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    143
  • Lastpage
    150
  • Abstract
    Mining the large Web based online distributed databases to discover new knowledge and financial gain is an important research problem. These computations require high performance distributed and parallel computing environments. Traditional data mining techniques such as classification, association, clustering can be extended to find new efficient solutions. The paper presents the scalable data mining problem, proposes the use of software DSM (distributed shared memory) with a new mechanism as an effective solution and discusses both the implementation and performance evaluation results. It is observed that the overhead of a software DSM is very large for scalable data mining programs. A new Log Based Consistency (LBC) mechanism, especially designed for scalable data mining on the software DSM is proposed to overcome this overhead. Traditional association rule based data mining programs frequently modify the same fields by count-up operations. In contrast, the LBC mechanism keeps up the consistency by broadcasting the count-up operation logs among the multiple nodes
  • Keywords
    data integrity; data mining; distributed databases; distributed shared memory systems; information resources; very large databases; LBC mechanism; Web based online distributed database mining; association rule based data mining programs; count-up operation logs; count-up operations; data mining techniques; distributed shared memory; high performance distributed computing; log based consistency DSM; multiple nodes; parallel computing environments; performance evaluation results; research problem; scalable data mining; scalable data mining problem; scalable data mining programs; software DSM; Association rules; Bayesian methods; Broadcasting; Clustering algorithms; Data mining; Databases; Distributed computing; High performance computing; Information systems; Software performance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Engineering of Complex Computer Systems, 2000. ICECCS 2000. Proceedings. Sixth IEEE International Conference on
  • Conference_Location
    Tokyo
  • Print_ISBN
    0-7695-0583-X
  • Type

    conf

  • DOI
    10.1109/ICECCS.2000.873938
  • Filename
    873938