DocumentCode :
1665761
Title :
Clusters: challenges and opportunities
Author :
Reed, Daniel A.
Author_Institution :
Nat. Center for Supercomput. Applications, Illinois Univ., Urbana, IL, USA
fYear :
2003
Abstract :
Summary form only given. The continuum of cluster computing continues to expand, with terascale clusters now in production and petascale clusters in design. How do we manage clusters with tens of thousands of nodes, each with power, communication, processing, and memory constraints? How do we design, package, and support systems with hundreds of thousands of processors in a reliable way? This paper discusses the approaches to computing and communication fault-tolerance and reliability for large-scale clusters. It also sketches some of the technical challenges and opportunities in deploying and supporting large-scale clusters, highlighted by recent developments at NCSA and the U.S. TeraGrid and their application to emerging scientific applications.
Keywords :
performance evaluation; workstation clusters; cluster computing; communication fault tolerance; petascale clusters; reliability; terascale clusters; Energy management; Fault tolerance; Large-scale systems; Memory management; Packaging; Petascale computing; Power system management; Power system reliability; Production; USA Councils;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium, 2003. Proceedings. International
ISSN :
1530-2075
Print_ISBN :
0-7695-1926-1
Type :
conf
DOI :
10.1109/IPDPS.2003.1213359
Filename :
1213359
Link To Document :
بازگشت