• DocumentCode
    2054879
  • Title

    PaDDMAS: parallel and distributed data mining application suite

  • Author

    Rana, Omer ; Walker, David ; Li, Maozhen ; Lynden, Steven ; Ward, Mike

  • Author_Institution
    Dept. of Comput. Sci., Wales Univ., Cardiff, UK
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    387
  • Lastpage
    392
  • Abstract
    Discovering complex associations, anomalies and patterns in distributed data sets is gaining popularity in a range of scientific, medical and business applications. Various algorithms are employed to perform data analysis within a domain, and range from statistical to machine learning and AI based techniques. Several issues need to be addressed however to scale such approaches to large data sets, particularly when these are applied to data distributed at various sites. As new analysis techniques are identified, the core tool set must enable easy integration of such analytical components. Similarly, results from an analysis engines must be sharable, to enable storage, visualisation or further analysis of results. We describe the architecture of PaDDMAS, a component based system for developing distributed data mining applications. PaDDMAS provides a tool set for combining pre-developed or custom components using a dataflow approach, with components performing analysis, data extraction or data management and translation. Each component is wrapped as a Java/CORBA object, and has an interface defined in XML. Components can be serial or parallel objects, and may be binary or contain a more complex internal structure. We demonstrate a prototype using a neural network analysis algorithm
  • Keywords
    data analysis; data mining; information retrieval; learning (artificial intelligence); parallel processing; AI based techniques; CORBA object; Java; PaDDMAS; data analysis; data extraction; data management; data mining application suite; distributed data sets; machine learning; neural network analysis algorithm; visualisation; Artificial intelligence; Data analysis; Data mining; Engines; Java; Machine learning; Machine learning algorithms; Performance analysis; Visualization; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International
  • Conference_Location
    Cancun
  • Print_ISBN
    0-7695-0574-0
  • Type

    conf

  • DOI
    10.1109/IPDPS.2000.846010
  • Filename
    846010