DocumentCode :
2197536
Title :
Decoupling computation and data scheduling in distributed data-intensive applications
Author :
Ranganathan, Kavitha ; Foster, Ian
Author_Institution :
Dept. of Comput. Sci., Chicago Univ., IL, USA
fYear :
2002
fDate :
2002
Firstpage :
352
Lastpage :
358
Abstract :
In high-energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So-called Data Grids seek to harness geographically distributed resources for such large-scale data-intensive problems. Yet effective scheduling in such environments is challenging, due to a need to address a variety of metrics and constraints while dealing with multiple, potentially independent sources of jobs and a large number of storage, compute, and network resources. We describe a scheduling framework that addresses these problems. Within this framework, data movement operations may be either tightly bound to job scheduling decisions or, alternatively, performed by a decoupled, asynchronous process on the basis of observed data access patterns and load. We develop a family of algorithms and use simulation studies to evaluate various combinations. Our results suggest that while it is necessary to consider the impact of replication, it is not always necessary to couple data movement and computation scheduling. Instead, these two activities can be addressed separately, thus significantly simplifying the design and implementation.
Keywords :
distributed processing; processor scheduling; Data Grids; bioinformatics; data access patterns; data movement operations; data scheduling; decoupled asynchronous process; distributed data-intensive applications; geographically distributed resources; global allocation policies; high-energy physics; independent job sources; job scheduling decisions; large-scale data-intensive problems; local allocation policies; resource utilization; response time; scheduling framework; Application software; Bioinformatics; Computer science; Distributed computing; Laboratories; Large-scale systems; Physics computing; Processor scheduling; Resource management; Scheduling algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Distributed Computing, 2002. HPDC-11 2002. Proceedings. 11th IEEE International Symposium on
ISSN :
1082-8907
Print_ISBN :
0-7695-1686-6
Type :
conf
DOI :
10.1109/HPDC.2002.1029935
Filename :
1029935
Link To Document :
بازگشت