DocumentCode :
2193911
Title :
A Robust Communication Framework for Parallel Execution on Volunteer PC Grids
Author :
Rohit, Eshwar ; Nguyen, Hien ; Kanna, Nagarajan ; Subhlok, Jaspal ; Gabriel, Edgar ; Qian Wang ; Cheung, Margaret S. ; Anderson, David
Author_Institution :
Dept. of Comput. Sci., Univ. of Houston, Houston, TX, USA
fYear :
2011
fDate :
23-26 May 2011
Firstpage :
134
Lastpage :
143
Abstract :
Volunteer PC grids represent massive computation capacity at a low cost, but are challenging to employ for parallel computing because of variable and unpredictable performance and availability. A communicating parallel program must employ explicit redundancy, or implicit redundancy with uncoordinated checkpoint-restart to make continuous forward progress in such an unreliable environment. A communication model based on one-sided Put/Get calls to an abstract global shared space is a good match as processes can execute their communication operations independently and asynchronously. However, no existing system is designed for redundant communicating processes. The key problem is that a single logical operation that impacts the global program state may be executed by different instances of the same process at different times leading to semantic inconsistency. This paper presents the design, execution model, implementation, and usage of {em Volpex}, a communication layer for robust execution on volunteer PC grids. The research leads to a practical way to employ idle PCs for latency tolerant parallel computing applications.
Keywords :
checkpointing; grid computing; parallel programming; redundancy; Volpex; communication layer; one sided Put-Get call; parallel execution; parallel program communication; redundant communicating process; robust communication framework; semantic inconsistency; uncoordinated checkpoint restart; volunteer PC grid; Computational modeling; Fault tolerant systems; Libraries; Programming; Redundancy; Servers; Desktop Grids; Fault Tolerance; Parallel execution; Redundant Computation; Volunteer Computing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2011 11th IEEE/ACM International Symposium on
Conference_Location :
Newport Beach, CA
Print_ISBN :
978-1-4577-0129-0
Electronic_ISBN :
978-0-7695-4395-6
Type :
conf
DOI :
10.1109/CCGrid.2011.72
Filename :
5948604
Link To Document :
بازگشت