مرکز منطقه ای اطلاع رساني علوم و فناوري - Task Scheduling Algorithm for Multicore Processor System for Minimizing Recovery Time in Case of Single Node Fault

DocumentCode :

2320772

Title :

Task Scheduling Algorithm for Multicore Processor System for Minimizing Recovery Time in Case of Single Node Fault

Author :

Gotoda, Shohei ; Ito, Minoru ; Shibata, Naoki

Author_Institution :

Nara Inst. of Sci. & Technol. Nara, Nara, Japan

fYear :

2012

fDate :

13-16 May 2012

Firstpage :

260

Lastpage :

267

Abstract :

In this paper, we propose a task scheduling algorithm for a multicore processor system which reduces the recovery time in case of a single fail-stop failure of a multicore processor. Many of the recently developed processors have multiple cores on a single die, so that one failure of a computing node results in failure of many processors. In the case of a failure of a multicore processor, all tasks which have been executed on the failed multicore processor have to be recovered at once. The proposed algorithm is based on an existing check pointing technique, and we assume that the state is saved when nodes send results to the next node. If a series of computations that depends on former results is executed on a single die, we need to execute all parts of the series of computations again in the case of failure of the processor. The proposed scheduling algorithm tries not to concentrate tasks to processors on a die. We designed our algorithm as a parallel algorithm that achieves O(n) speedup where n is the number of processors. We evaluated our method using simulations and experiments with four PCs. We compared our method with existing scheduling method, and in the simulation, the execution time including recovery time in the case of a node failure is reduced by up to 50% while the overhead in the case of no failure was a few percent in typical scenarios.

Keywords :

checkpointing; computational complexity; multiprocessing systems; parallel algorithms; processor scheduling; checkpointing technique; computing node; multicore processor system; node failure; parallel algorithm; recovery time minimization; single fail-stop failure; single node fault; task scheduling algorithm; Bandwidth; Computational modeling; Multicore processing; Schedules; Scheduling; Scheduling algorithms; multicore processor; node fault; recovery; task scheduling;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on

Conference_Location :

Ottawa, ON

Print_ISBN :

978-1-4673-1395-7

Type :

conf

DOI :

10.1109/CCGrid.2012.23

Filename :

6217430

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2320772