Title :
CCL v3.0: multiprogrammed semi-asynchronous checkpoints
Author :
Quaglia, Francesco ; Santoro, Andrea
Author_Institution :
Dipt. di Inf. e Sistemistica, Rome Univ., Italy
Abstract :
CCL (Checkpointing and Communication Library) is a recently developed software in support of optimistic parallel simulation on myrinet based clusters. Beyond classical low latency message delivery functionalities, this library implements CPU offloaded, semiasynchronous checkpointing functionalities based on data transfer capabilities provided by a programmable DMA engine on board myrinet network cards. The latest version of CCL (v2.4), designed for M2M-PCI32C myrinet cards, only supports monoprogrammed semiasynchronous checkpoints. This forces resynchronization between CPU and DMA activities each time a new checkpoint request must be issued at the simulation application level while the last issued one is still being carried out by the DMA engine. We present CCL v3.0 that, exploiting hardware features of more advanced M3M-PCI64C myrinet cards, supports multiprogrammed semiasynchronous checkpoints. The multiprogrammed approach allows higher degree of concurrency between checkpointing and other simulation specific operations carried out by the CPU, with obvious benefits on performance. We also report the results of the evaluation of those benefits for the case of a personal communication system simulation application.
Keywords :
concurrency control; discrete event simulation; message passing; multiprogramming; optimisation; parallel processing; personal communication networks; software libraries; system recovery; CCL v3; CPU activity; Checkpointing and Communication Library; DMA activity; M3M-PCI64C myrinet card; multiprogrammed semiasynchronous checkpoint; myrinet-based cluster; optimistic parallel simulation; personal communication system simulation; programmable DMA engine; Checkpointing; Concurrent computing; Costs; Delay; Discrete event simulation; Engines; Hardware; Reduced instruction set computing; Remuneration; Software libraries;
Conference_Titel :
Parallel and Distributed Simulation, 2003. (PADS 2003). Proceedings. Seventeenth Workshop on
Print_ISBN :
0-7695-1970-9
DOI :
10.1109/PADS.2003.1207417