• Title of article

    Scheduling strategies for efficient ETL execution

  • Author/Authors

    Anastasios Karagiannis، نويسنده , , Panos Vassiliadis، نويسنده , , Alkis Simitsis، نويسنده ,

  • Issue Information
    روزنامه با شماره پیاپی سال 2013
  • Pages
    19
  • From page
    927
  • To page
    945
  • Abstract
    Extract-transform-load (ETL) workflows model the population of enterprise data warehouses with information gathered from a large variety of heterogeneous data sources. ETL workflows are complex design structures that run under strict performance requirements and their optimization is crucial for satisfying business objectives. In this paper, we deal with the problem of scheduling the execution of ETL activities (a.k.a. transformations, tasks, operations), with the goal of minimizing ETL execution time and allocated memory. We investigate the effects of four scheduling policies on different flow structures and configurations and experimentally show that the use of different scheduling policies may improve ETL performance in terms of memory consumption and execution time. First, we examine a simple, fair scheduling policy. Then, we study the pros and cons of two other policies: the first opts for emptying the largest input queue of the flow and the second for activating the operation (a.k.a. activity) with the maximum tuple consumption rate. Finally, we examine a fourth policy that combines the advantages of the latter two in synergy with flow parallelization.
  • Keywords
    record linkage , Entity resolution , Data matching , Data Quality , Privacy techniques , Survey
  • Journal title
    Information Systems
  • Serial Year
    2013
  • Journal title
    Information Systems
  • Record number

    1230337