DocumentCode
1665761
Title
Clusters: challenges and opportunities
Author
Reed, Daniel A.
Author_Institution
Nat. Center for Supercomput. Applications, Illinois Univ., Urbana, IL, USA
fYear
2003
Abstract
Summary form only given. The continuum of cluster computing continues to expand, with terascale clusters now in production and petascale clusters in design. How do we manage clusters with tens of thousands of nodes, each with power, communication, processing, and memory constraints? How do we design, package, and support systems with hundreds of thousands of processors in a reliable way? This paper discusses the approaches to computing and communication fault-tolerance and reliability for large-scale clusters. It also sketches some of the technical challenges and opportunities in deploying and supporting large-scale clusters, highlighted by recent developments at NCSA and the U.S. TeraGrid and their application to emerging scientific applications.
Keywords
performance evaluation; workstation clusters; cluster computing; communication fault tolerance; petascale clusters; reliability; terascale clusters; Energy management; Fault tolerance; Large-scale systems; Memory management; Packaging; Petascale computing; Power system management; Power system reliability; Production; USA Councils;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2003. Proceedings. International
ISSN
1530-2075
Print_ISBN
0-7695-1926-1
Type
conf
DOI
10.1109/IPDPS.2003.1213359
Filename
1213359
Link To Document