• DocumentCode
    2050016
  • Title

    DARE: Adaptive Data Replication for Efficient Cluster Scheduling

  • Author

    Abad, Cristina L. ; Lu, Yi ; Campbell, Roy H.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
  • fYear
    2011
  • fDate
    26-30 Sept. 2011
  • Firstpage
    159
  • Lastpage
    168
  • Abstract
    Placing data as close as possible to computation is a common practice of data intensive systems, commonly referred to as the data locality problem. By analyzing existing production systems, we confirm the benefit of data locality and find that data have different popularity and varying correlation of accesses. We propose DARE, a distributed adaptive data replication algorithm that aids the scheduler to achieve better data locality. DARE solves two problems, how many replicas to allocate for each file and where to place them, using probabilistic sampling and a competitive aging algorithm independently at each node. It takes advantage of existing remote data accesses in the system and incurs no extra network usage. Using two mixed workload traces from Face book, we show that DARE improves data locality by more than 7 times with the FIFO scheduler in Hadoop and achieves more than 85% data locality for the FAIR scheduler with delay scheduling. Turnaround time and job slowdown are reduced by 19% and 25%, respectively.
  • Keywords
    data handling; information retrieval; probability; sampling methods; scheduling; social networking (online); DARE; FAIR scheduler; FIFO scheduler; Facebook; Hadoop; cluster scheduling; competitive aging algorithm; data intensive system; data locality problem; delay scheduling; distributed adaptive data replication algorithm; job slowdown; network usage; probabilistic sampling; remote data access; turnaround time; Aging; Bandwidth; Cloud computing; Clustering algorithms; Distributed databases; Heuristic algorithms; Probabilistic logic; MapReduce; locality; replication; scheduling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2011 IEEE International Conference on
  • Conference_Location
    Austin, TX
  • Print_ISBN
    978-1-4577-1355-2
  • Electronic_ISBN
    978-0-7695-4516-5
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2011.26
  • Filename
    6061051