Title :
Clusters: challenges and opportunities
Author_Institution :
Nat. Center for Supercomput. Applications, Illinois Univ., Urbana, IL, USA
Abstract :
Summary form only given. The continuum of cluster computing continues to expand, with terascale clusters now in production and petascale clusters in design. How do we manage clusters with tens of thousands of nodes, each with power, communication, processing, and memory constraints? How do we design, package, and support systems with hundreds of thousands of processors in a reliable way? This paper discusses the approaches to computing and communication fault-tolerance and reliability for large-scale clusters. It also sketches some of the technical challenges and opportunities in deploying and supporting large-scale clusters, highlighted by recent developments at NCSA and the U.S. TeraGrid and their application to emerging scientific applications.
Keywords :
performance evaluation; workstation clusters; cluster computing; communication fault tolerance; petascale clusters; reliability; terascale clusters; Energy management; Fault tolerance; Large-scale systems; Memory management; Packaging; Petascale computing; Power system management; Power system reliability; Production; USA Councils;
Conference_Titel :
Parallel and Distributed Processing Symposium, 2003. Proceedings. International
Print_ISBN :
0-7695-1926-1
DOI :
10.1109/IPDPS.2003.1213359