• DocumentCode
    3147353
  • Title

    Fault Tolerant Data Acquisition through Dynamic Load Balancing

  • Author

    Simon, Michal

  • Author_Institution
    Fac. of Autom. Control, Electron. & Comput. Sci., Silesian Univ. of Technol., Gliwice, Poland
  • fYear
    2011
  • fDate
    16-20 May 2011
  • Firstpage
    2049
  • Lastpage
    2052
  • Abstract
    Modern detectors used in high energy physics experiments are complex instruments designed to register collisions of particles at a rate in the MHz range. Data that correspond to a single collision of particles, referred to as an event, are acquired from millions of readout channels, and filtered, first by dedicated hardware, and then by computing farms running sophisticated filtering algorithms. In case of data acquisition systems with single-stage software filtration, due to the high input rate (the order of 100 kHz), the data are usually distributed in a static way between filtering nodes. However, the static distribution determines strongly the system, and results in decreased fault tolerance. The main objective of the presented studies is to increase the system´s overall fault tolerance through dynamic load balancing. The proposed method aims to balance the workload inside heterogeneous systems, as well as, homogeneous systems, where the imbalance could be caused by faults. Moreover, our research includes developing a scalable load balancing protocol along with a distributed asynchronous load assignment policy. As a case study we consider the Data Acquisition system of the Compact Muon Solenoid experiment at CERN´s new Large Hadron Collider.
  • Keywords
    data acquisition; distributed processing; fault tolerant computing; filtering theory; physics computing; resource allocation; sensors; compact Muon solenoid experiment; complex instruments; distributed asynchronous load assignment policy; dynamic load balancing; fault tolerance; fault tolerant data acquisition system; filtering algorithms; high energy physics experiments; particle collisions; single-stage software filtration; Algorithm design and analysis; Data acquisition; Distributed databases; Fault tolerance; Fault tolerant systems; Filtering; Load management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
  • Conference_Location
    Shanghai
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-61284-425-1
  • Electronic_ISBN
    1530-2075
  • Type

    conf

  • DOI
    10.1109/IPDPS.2011.374
  • Filename
    6009087