Title :
Task Scheduling and File Replication for Data-Intensive Jobs with Batch-shared I/O
Author :
Khanna, Gaurav ; Vydyanathan, Nagavijayalakshmi ; Catalyurek, Umit ; Kurc, Tahsin ; Krishnamoorthy, Sriram ; Sadayappan, P. ; Saltz, Joel
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH
Abstract :
This paper addresses the problem of efficient execution of a batch of data-intensive tasks with batch-shared I/O behavior, on coupled storage and compute clusters. Two scheduling schemes are proposed: 1) a 0-1 integer programming (IP) based approach, which couples task scheduling and data replication, and 2) a bi-level hypergraph partitioning based heuristic approach (BiPartition), which decouples task scheduling and data replication. The experimental results show that: 1) the IP scheme achieves the best batch execution time, but has significant scheduling overhead, thereby restricting its application to small scale workloads, and 2) the BiPartition scheme is a better fit for larger workloads and systems - it has very low scheduling overhead and no more than 5-10% degradation in solution quality, when compared with the IP based approach
Keywords :
integer programming; processor scheduling; storage management; BiPartition scheme; batch-shared I/O behavior; bi-level hypergraph partitioning; data-intensive job; file replication; heuristic approach; integer programming; task scheduling; Biomedical computing; Biomedical engineering; Biomedical informatics; Computer science; Data analysis; Data engineering; Degradation; Linear programming; Processor scheduling; Subcontracting;
Conference_Titel :
High Performance Distributed Computing, 2006 15th IEEE International Symposium on
Conference_Location :
Paris
Print_ISBN :
1-4244-0307-3
DOI :
10.1109/HPDC.2006.1652155