• DocumentCode
    2041531
  • Title

    Monitoring remotely executing shared memory programs in software DSMs

  • Author

    Fei, Long ; Fang, Xing ; Hu, Y. Charlie ; Midkiff, Samuel P.

  • Author_Institution
    Purdue Univ., West Lafayette, IN
  • fYear
    2006
  • fDate
    25-29 April 2006
  • Abstract
    Peer-to-peer (P2P) cycle sharing over the Internet has become increasingly popular as a way to share idle cycles. A fundamental problem faced by P2P cycle sharing systems is how to incrementally monitor and verify, with low overhead, the execution of jobs submitted to a remote untrusted hosting machine, or cluster of machines. In this paper, we present the design and implementation of GripCop DSM, a novel incremental execution monitoring and verification scheme for software distributed shared memory (SDSM) programs running on remote clusters. Our scheme maximally leverages the shared memory abstraction provided by the SDSM system by extending the shared memory abstraction to the monitoring process by replicating one of the processes running on the host cluster to verify intermediate results at runtime. Our GripCop DSM employs two monitoring schemes: (i) a full-scale monitoring scheme that completely replicates the computation of a process running on the cluster; and (ii) a decoy monitoring scheme that deceives the host cluster into believing that full-scale monitoring is being performed without it ever actually being done, thereby incurring negligible overhead. Experiments show that the combined use of full-scale and decoy monitoring ensures faithful execution with low performance impact, even over a wide area network
  • Keywords
    peer-to-peer computing; program verification; shared memory systems; system monitoring; GripCop; P2P; decoy monitoring; full-scale monitoring; incremental execution monitoring; peer-to-peer cycle sharing; program verification; remotely executing shared memory program monitoring; shared memory abstraction; software distributed shared memory; Central Processing Unit; Computer networks; Computerized monitoring; Condition monitoring; IP networks; Internet; Peer to peer computing; Remote monitoring; Runtime; Workstations;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International
  • Conference_Location
    Rhodes Island
  • Print_ISBN
    1-4244-0054-6
  • Type

    conf

  • DOI
    10.1109/IPDPS.2006.1639276
  • Filename
    1639276