Title :
Failure Prediction Mechanisms in Cluster Systems
Author :
Sharifi, Mohsen ; Hamedi, Seyed Ali
Author_Institution :
Comput. Eng. Dept., Iran Univ. of Sci. & Technol., Tehran
fDate :
June 29 2008-July 5 2008
Abstract :
Clustering is an important technique for improving the performance and availability of computer systems. The use of cluster systems is also continuously growing because they present excellent features like scalability, high availability and high performance computing. Availability is mainly administered by failure detection and recovery mechanism, including proactive failure mechanisms that try to prevent occurrences of faults. Given the criticality and importance of availability for high performance computing, this paper uniquely surveyes noticeable existing mechanisms for prevention of faults in high availability and high performance computing cluster systems, and presents a comparative overview.
Keywords :
biology computing; medical computing; system recovery; workstation clusters; cluster systems; failure detection; failure prediction; proactive failure mechanisms; recovery mechanism; Availability; Bioinformatics; Biomedical computing; Biomedical engineering; Failure analysis; Fault detection; Fault tolerance; High performance computing; Network servers; Scalability;
Conference_Titel :
Biocomputation, Bioinformatics, and Biomedical Technologies, 2008. BIOTECHNO '08. International Conference on
Conference_Location :
Bucharest
Print_ISBN :
978-0-7695-3191-5
Electronic_ISBN :
978-0-7695-3191-5
DOI :
10.1109/BIOTECHNO.2008.11