Title :
PaDDMAS: parallel and distributed data mining application suite
Author :
Rana, Omer ; Walker, David ; Li, Maozhen ; Lynden, Steven ; Ward, Mike
Author_Institution :
Dept. of Comput. Sci., Wales Univ., Cardiff, UK
Abstract :
Discovering complex associations, anomalies and patterns in distributed data sets is gaining popularity in a range of scientific, medical and business applications. Various algorithms are employed to perform data analysis within a domain, and range from statistical to machine learning and AI based techniques. Several issues need to be addressed however to scale such approaches to large data sets, particularly when these are applied to data distributed at various sites. As new analysis techniques are identified, the core tool set must enable easy integration of such analytical components. Similarly, results from an analysis engines must be sharable, to enable storage, visualisation or further analysis of results. We describe the architecture of PaDDMAS, a component based system for developing distributed data mining applications. PaDDMAS provides a tool set for combining pre-developed or custom components using a dataflow approach, with components performing analysis, data extraction or data management and translation. Each component is wrapped as a Java/CORBA object, and has an interface defined in XML. Components can be serial or parallel objects, and may be binary or contain a more complex internal structure. We demonstrate a prototype using a neural network analysis algorithm
Keywords :
data analysis; data mining; information retrieval; learning (artificial intelligence); parallel processing; AI based techniques; CORBA object; Java; PaDDMAS; data analysis; data extraction; data management; data mining application suite; distributed data sets; machine learning; neural network analysis algorithm; visualisation; Artificial intelligence; Data analysis; Data mining; Engines; Java; Machine learning; Machine learning algorithms; Performance analysis; Visualization; XML;
Conference_Titel :
Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International
Conference_Location :
Cancun
Print_ISBN :
0-7695-0574-0
DOI :
10.1109/IPDPS.2000.846010