Title :
A genetic-based optimal checkpoint placement strategy for multicore processors
Author :
Lotfi, Atieh ; Safari, Saeed
Author_Institution :
Sch. of Electr. & Comput. Eng., Univ. of Tehran, Tehran, Iran
Abstract :
Nowadays multicore processors are increasingly being deployed in high performance computing systems. As the complexity of systems increases, the probability of failure increases substantially. Therefore, the system requires techniques for supporting fault tolerance. Checkpointing technique is widely used to reduce the execution time of long-running programs in the presence of failures and to enhance the reliability of such systems. Optimizing the number of checkpoints in a parallel application running on a multicore processor is a complicated and challenging task. Infrequent checkpointing results in long reprocessing time, while too short checkpointing intervals lead to high checkpointing overhead. Since this is a multi-objective optimization problem, trapping in local optimums is very plausible. On the other hand, bio-inspired algorithms are powerful function optimizers that are successfully used to solve problems in many different areas. In this paper, by applying genetic algorithm, which is a well-known bio-inspired computing algorithm, finding optimal checkpoint placement in parallel applications is exercised. Under certain fault conditions, this new checkpoint placement strategy outperforms the existing ones with a significant reduction in the total wasted times. Our experimental results show that our method, which is implementable on any message-passing multicore system, can optimally find the suitable points in which checkpoints should be taken.
Keywords :
checkpointing; genetic algorithms; message passing; multiprocessing systems; probability; bio-inspired algorithm; checkpointing overhead; checkpointing technique; failure probability; fault tolerance; genetic algorithm; genetic-based optimal checkpoint placement strategy; high performance computing system; message-passing multicore system; multicore processor; multiobjective optimization problem; parallel application; program execution time; reprocessing time; Benchmark testing; Biological cells; Checkpointing; Genetic algorithms; Genetics; Multicore processing; Program processors; Fault Tolerance; Genetic Algorithm; Multicore Architectures; Optimal Checkpoint Placement;
Conference_Titel :
Computer Architecture and Digital Systems (CADS), 2012 16th CSI International Symposium on
Conference_Location :
Shiraz, Fars
Print_ISBN :
978-1-4673-1481-7
DOI :
10.1109/CADS.2012.6316440