DocumentCode
1064172
Title
Management of Online Processing Farms in the ATLAS Experiment
Author
Dobson, Marc ; Malik, Usman Ahmad ; Elejabarrieta, Hegoi Garitaonandia
Author_Institution
Eur. Organ. for Nucl. Res., Geneva
Volume
55
Issue
1
fYear
2008
Firstpage
411
Lastpage
416
Abstract
The ATLAS experiment will use of order three thousand nodes for the online processing farms. The administration of such a large cluster is a challenge. The ability to quickly turn on/off machines, especially after a power cut, and the ability to remote monitor the hardware health whether the machine be on or off are some of the major issues. To solve these problems ATLAS has decided wherever possible to use Intelligent Platform Management Interfaces (IPMI) for its nodes. This paper will present the mechanisms which were developed to allow the distribution of management and monitoring commands to many machines. These commands were run simultaneously on the prototype farm, by taking into account the specificities of the different IPMI versions and implementations, and the network topology. Results from timing measurements for the distribution of commands to many nodes, for booting and for shutting down of the nodes will be shown with an extrapolation to the final cluster size.
Keywords
computerised monitoring; high energy physics instrumentation computing; position sensitive particle detectors; ATLAS experiment; hardware monitoring; intelligent platform management interfaces; network topology; online processing farms; Condition monitoring; Hardware; Machine intelligence; Network topology; Personal communication networks; Pipelines; Prototypes; Remote monitoring; Sensor phenomena and characterization; Temperature sensors; ATLAS; Administration; cluster; farm;
fLanguage
English
Journal_Title
Nuclear Science, IEEE Transactions on
Publisher
ieee
ISSN
0018-9499
Type
jour
DOI
10.1109/TNS.2007.913489
Filename
4448473
Link To Document