• DocumentCode
    1697646
  • Title

    Investigating resilient high performance reconfigurable computing with minimally-invasive system monitoring

  • Author

    Bin Huang ; Schmidt, Andrew G. ; Mendon, Ashwin A. ; Sass, Ron

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of North Carolina at Charlotte, Charlotte, NC, USA
  • fYear
    2010
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    As researchers push for Exascale computing, one of the emerging challenges is system resilience. Unlike fault-tolerance which corrects errors, recent reports suggest that resilient systems will need to continue to make progress on an application despite faults. A first step in developing a resilient system is to have robust, scalable system monitoring. The work described here presents a novel, minimally-invasive system monitor that operates over a separate network. We analytically characterize the performance for an arbitrary set of nodes and demonstrate a working implementation of the design. We argue that the hardware approach is inherently superior to the ad hoc, software techniques currently employed in practice.
  • Keywords
    invasive software; software fault tolerance; system monitoring; Exascale computing; fault tolerance; minimally-invasive system monitoring; resilient high performance reconfigurable computing; resilient system; scalable system monitoring; software technique; Biomedical monitoring; Field programmable gate arrays; Hardware; Magnetic heads; Monitoring; Resilience; Software;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High-Performance Reconfigurable Computing Technology and Applications ( HPRCTA), 2010 Fourth International Workshop on
  • Conference_Location
    New Orleans, LA
  • ISSN
    2150-7945
  • Print_ISBN
    978-1-4244-9516-0
  • Electronic_ISBN
    2150-7945
  • Type

    conf

  • DOI
    10.1109/HPRCTA.2010.5670795
  • Filename
    5670795