Title :
An Optimized Policy for Automatic Failure Recovery in Microrebootable Distributed Systems
Author :
Lu Xu ; Wang Hui-qiang ; Zhao Guo-sheng
Author_Institution :
Coll. of Comput. Sci. & Technol., Harbin Eng. Univ., Harbin, China
Abstract :
To overcome the challenges of recovery polices generation in the presence of inaccurate failure detection, a failure recovery model for microrebootable distributed systems based on discounted Partially Observable Markov Decision Processes is presented in this paper. Thus the reasonable recovery policies are generated by solving the POMDP model. To tackle the problem of computational complexity of exact solution, a value function approximate solution called fast informed bound solution is used for the near-optimal policies. Simulation-based experimental results on a realistic network security situation prediction system demonstrate that the proposed model can be solved effectively, and the resulting policies convincingly outperform others.
Keywords :
Markov processes; computer bootstrapping; optimisation; software fault tolerance; automatic failure recovery; failure detection; fast informed bound solution; micro-rebootable distributed systems; partially observable Markov decision processes; recovery policy optimization; value function approximate solution; Aging; Availability; Computational complexity; Computational modeling; Computer networks; Information science; Iterative methods; Neural networks; Predictive models; Software systems;
Conference_Titel :
Information Science and Engineering (ICISE), 2009 1st International Conference on
Conference_Location :
Nanjing
Print_ISBN :
978-1-4244-4909-5
DOI :
10.1109/ICISE.2009.292