• DocumentCode
    3677751
  • Title

    Distributed Recovery for Enterprise Services

  • Author

    Shane S. Clark;Jacob Beal;Partha Pal

  • Author_Institution
    Raytheon BBN Technol., Cambridge, MA, USA
  • fYear
    2015
  • Firstpage
    111
  • Lastpage
    120
  • Abstract
    Small-to medium-scale enterprise systems are typically complex and highly specialized, but lack the management resources that can be devoted to large-scale (e.g., Cloud) systems, making them extremely challenging to manage. Here we present an adaptive algorithm for addressing a common management problem in enterprise service networks: safely and rapidly recovering from the failure of one or more services. Due to poorly documented and shifting dependencies, a typical industry practice for this situation is to bring the entire system down, then to restart services one at a time in a predefined order. We improve on this practice with the Dependency-Directed Recovery (DDR) algorithm, which senses dependencies by observing network interactions and recovers near-optimally from failures following a distributed graph algorithm. Our Java-based implementation of this system is suitable for deployment with a wide variety of networked enterprise services, and we validate its correct operation and advantage over fixed-order restart with emulation experiments on networks of up to 20 services.
  • Keywords
    "Servers","Monitoring","Electronic mail","Logic gates","Databases","Reliability","Sockets"
  • Publisher
    ieee
  • Conference_Titel
    Self-Adaptive and Self-Organizing Systems (SASO), 2015 IEEE 9th International Conference on
  • Type

    conf

  • DOI
    10.1109/SASO.2015.19
  • Filename
    7306601