DocumentCode
3147353
Title
Fault Tolerant Data Acquisition through Dynamic Load Balancing
Author
Simon, Michal
Author_Institution
Fac. of Autom. Control, Electron. & Comput. Sci., Silesian Univ. of Technol., Gliwice, Poland
fYear
2011
fDate
16-20 May 2011
Firstpage
2049
Lastpage
2052
Abstract
Modern detectors used in high energy physics experiments are complex instruments designed to register collisions of particles at a rate in the MHz range. Data that correspond to a single collision of particles, referred to as an event, are acquired from millions of readout channels, and filtered, first by dedicated hardware, and then by computing farms running sophisticated filtering algorithms. In case of data acquisition systems with single-stage software filtration, due to the high input rate (the order of 100 kHz), the data are usually distributed in a static way between filtering nodes. However, the static distribution determines strongly the system, and results in decreased fault tolerance. The main objective of the presented studies is to increase the system´s overall fault tolerance through dynamic load balancing. The proposed method aims to balance the workload inside heterogeneous systems, as well as, homogeneous systems, where the imbalance could be caused by faults. Moreover, our research includes developing a scalable load balancing protocol along with a distributed asynchronous load assignment policy. As a case study we consider the Data Acquisition system of the Compact Muon Solenoid experiment at CERN´s new Large Hadron Collider.
Keywords
data acquisition; distributed processing; fault tolerant computing; filtering theory; physics computing; resource allocation; sensors; compact Muon solenoid experiment; complex instruments; distributed asynchronous load assignment policy; dynamic load balancing; fault tolerance; fault tolerant data acquisition system; filtering algorithms; high energy physics experiments; particle collisions; single-stage software filtration; Algorithm design and analysis; Data acquisition; Distributed databases; Fault tolerance; Fault tolerant systems; Filtering; Load management;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
Conference_Location
Shanghai
ISSN
1530-2075
Print_ISBN
978-1-61284-425-1
Electronic_ISBN
1530-2075
Type
conf
DOI
10.1109/IPDPS.2011.374
Filename
6009087
Link To Document