DocumentCode
2054879
Title
PaDDMAS: parallel and distributed data mining application suite
Author
Rana, Omer ; Walker, David ; Li, Maozhen ; Lynden, Steven ; Ward, Mike
Author_Institution
Dept. of Comput. Sci., Wales Univ., Cardiff, UK
fYear
2000
fDate
2000
Firstpage
387
Lastpage
392
Abstract
Discovering complex associations, anomalies and patterns in distributed data sets is gaining popularity in a range of scientific, medical and business applications. Various algorithms are employed to perform data analysis within a domain, and range from statistical to machine learning and AI based techniques. Several issues need to be addressed however to scale such approaches to large data sets, particularly when these are applied to data distributed at various sites. As new analysis techniques are identified, the core tool set must enable easy integration of such analytical components. Similarly, results from an analysis engines must be sharable, to enable storage, visualisation or further analysis of results. We describe the architecture of PaDDMAS, a component based system for developing distributed data mining applications. PaDDMAS provides a tool set for combining pre-developed or custom components using a dataflow approach, with components performing analysis, data extraction or data management and translation. Each component is wrapped as a Java/CORBA object, and has an interface defined in XML. Components can be serial or parallel objects, and may be binary or contain a more complex internal structure. We demonstrate a prototype using a neural network analysis algorithm
Keywords
data analysis; data mining; information retrieval; learning (artificial intelligence); parallel processing; AI based techniques; CORBA object; Java; PaDDMAS; data analysis; data extraction; data management; data mining application suite; distributed data sets; machine learning; neural network analysis algorithm; visualisation; Artificial intelligence; Data analysis; Data mining; Engines; Java; Machine learning; Machine learning algorithms; Performance analysis; Visualization; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International
Conference_Location
Cancun
Print_ISBN
0-7695-0574-0
Type
conf
DOI
10.1109/IPDPS.2000.846010
Filename
846010
Link To Document