Title :
Exploiting data lineage for parallel optimization in extensible DBMSs
Author :
Shek, Eddie C. ; Muntz, Richard R.
Author_Institution :
Inf. Sci. Lab., HRL Lab., Malibu, CA, USA
Abstract :
Extensibility and high query performance are important requirements of advanced large scale information systems since complex data analysis often requires the use of application-specific operations that have to be introduced by the user issuing the query. Towards the goal of supporting automatic parallelization of queries containing complex user-defined evaluators in an extensible DBMS, we devised a relevance window model to capture the inherent data lineage characteristics of evaluators on multidimensional data sets. Informally, the relevance window of an evaluator defines the scope of influence input data records have on the value of records in the output data space. An evaluator´s relevance window constrains the data partitioning opportunities available for an evaluator
Keywords :
data analysis; parallel databases; query processing; data analysis; data lineage; data partitioning; extensible databases; large scale information systems; multidimensional data sets; parallel query optimization; query performance; relevance window model; user-defined evaluators; Cloning; Computer science; Data analysis; Data mining; Database systems; Information systems; Laboratories; Large-scale systems; Multidimensional systems; Query processing;
Conference_Titel :
Data Engineering, 1999. Proceedings., 15th International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
0-7695-0071-4
DOI :
10.1109/ICDE.1999.754936