DocumentCode :
2187746
Title :
Logical Optimization of Dataflows for Data Mining and Integration Processes
Author :
Wöhrer, Alexander ; Mehofer, Eduard ; Brezany, Peter
Author_Institution :
Dept. of Sci. Comput., Univ. of Vienna, Vienna, Austria
fYear :
2010
fDate :
7-10 Dec. 2010
Firstpage :
117
Lastpage :
122
Abstract :
Modern scientific collaborations require large-scale data mining and integration processes. Their investigations involve multi-disciplinary expertise and large-scale computational experiments on top of large amounts of data that are located in distributed data repositories running various software systems, and managed by different organizations. Higher-level dataflow languages are used on top of parallel dataflow systems to enable faster program development and more maintainable code. Logical and physical optimization should be applied prior to its execution to improve performance. In this paper we present the rationale, theory, design and application of logical optimization of data flows for data mining and integration processes. A dataflow model is defined and several optimization algorithms, namely dead elements elimination, process re-ordering, parallelization, and data by-passing are developed. The first research prototype of the framework has been implemented in the context of the ADMIRE Data Mining and Integration Process Designer for logical optimization of specifications expressed in the DISPEL language developed in the ADMIRE project.
Keywords :
data flow analysis; data mining; high level languages; optimisation; software engineering; ADMIRE data mining; DISPEL language; data by passing; dead element elimination; distributed data repository; higher level dataflow language; integration process designer; large scale computational experiment; logical optimization; multidisciplinary expertise; parallel dataflow system; process reordering; scientific collaboration; software system; Adaptation model; Computational modeling; Data mining; Data models; Distributed databases; Optimization; Process control; data-intensive research; dataflows; logical optimization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
e-Science Workshops, 2010 Sixth IEEE International Conference on
Conference_Location :
Brisbane, QLD
Print_ISBN :
978-1-4244-8988-6
Electronic_ISBN :
978-0-7695-4295-9
Type :
conf
DOI :
10.1109/eScienceW.2010.28
Filename :
5693151
Link To Document :
بازگشت