DocumentCode :
1187010
Title :
Task allocation and reallocation for fault tolerance in multicomputer systems
Author :
Chen, Chien-In Henry ; Cherkassky, Vladimir
Author_Institution :
Dept. of Electr. Eng., Wright State Univ., Dayton, OH
Volume :
30
Issue :
4
fYear :
1994
fDate :
10/1/1994 12:00:00 AM
Firstpage :
1094
Lastpage :
1104
Abstract :
The goal of task allocation in a set of interconnected processors (computers) is to maximize the efficient use of resources and thus reduce the job turnaround time. Proposed is a simple yet effective method to allocate the tasks in multicomputer systems for minimizing the interprocessor communication cost subject to resource limitations defined by the system and designer. The limitations can be viewed as results from the load balancing since the execution time of each task, the number of available processors, processor speed, and memory capacity are known to the system or designer. As the number of processors increases, the probability of a failure existing somewhere in the systems at any time also increases. Very few established task allocation models have considered the reliability property. In multicomputer systems, we define system reliability as the probability that the system can run the tasks successfully. After the (nonredundant) task scheduling strategy is defined, tasks are then reallocated to processors statically and redundantly. This is a form of time redundancy, in which if some processors fail during the execution, all tasks can be completed on the remaining processors (but at a longer time). Due to static preallocation of tasks this method is simpler and thus more practical than well-known dynamic reconfiguration and rollback recovery techniques in multicomputer systems. We demonstrate the effectiveness of the task allocation and reallocation for hardware fault tolerance by illustrations of applying the methods to different examples and practical communications network multiprocessor system
Keywords :
fault tolerant computing; integer programming; minimisation; parallel architectures; redundancy; reliability; allocation; execution time; failure; fault tolerance; hardware fault tolerance; interconnected processors; interprocessor communication cost; job turnaround time; load balancing; minimisation; multicomputer systems; probability; reallocation; reliability; resource limitations; rollback recovery; static preallocation; task allocation; task scheduling; time redundancy; Communication networks; Costs; Fault tolerance; Fault tolerant systems; Hardware; Load management; Processor scheduling; Redundancy; Reliability; Resource management;
fLanguage :
English
Journal_Title :
Aerospace and Electronic Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9251
Type :
jour
DOI :
10.1109/7.328753
Filename :
328753
Link To Document :
بازگشت