DocumentCode
2041531
Title
Monitoring remotely executing shared memory programs in software DSMs
Author
Fei, Long ; Fang, Xing ; Hu, Y. Charlie ; Midkiff, Samuel P.
Author_Institution
Purdue Univ., West Lafayette, IN
fYear
2006
fDate
25-29 April 2006
Abstract
Peer-to-peer (P2P) cycle sharing over the Internet has become increasingly popular as a way to share idle cycles. A fundamental problem faced by P2P cycle sharing systems is how to incrementally monitor and verify, with low overhead, the execution of jobs submitted to a remote untrusted hosting machine, or cluster of machines. In this paper, we present the design and implementation of GripCop DSM, a novel incremental execution monitoring and verification scheme for software distributed shared memory (SDSM) programs running on remote clusters. Our scheme maximally leverages the shared memory abstraction provided by the SDSM system by extending the shared memory abstraction to the monitoring process by replicating one of the processes running on the host cluster to verify intermediate results at runtime. Our GripCop DSM employs two monitoring schemes: (i) a full-scale monitoring scheme that completely replicates the computation of a process running on the cluster; and (ii) a decoy monitoring scheme that deceives the host cluster into believing that full-scale monitoring is being performed without it ever actually being done, thereby incurring negligible overhead. Experiments show that the combined use of full-scale and decoy monitoring ensures faithful execution with low performance impact, even over a wide area network
Keywords
peer-to-peer computing; program verification; shared memory systems; system monitoring; GripCop; P2P; decoy monitoring; full-scale monitoring; incremental execution monitoring; peer-to-peer cycle sharing; program verification; remotely executing shared memory program monitoring; shared memory abstraction; software distributed shared memory; Central Processing Unit; Computer networks; Computerized monitoring; Condition monitoring; IP networks; Internet; Peer to peer computing; Remote monitoring; Runtime; Workstations;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International
Conference_Location
Rhodes Island
Print_ISBN
1-4244-0054-6
Type
conf
DOI
10.1109/IPDPS.2006.1639276
Filename
1639276
Link To Document