Title :
Improving wide-area distributed system availability
Author :
Wassermann, Bruno
Author_Institution :
Dept. of Comput. Sci., Univ. Coll. London, London, UK
Abstract :
The Software-as-a-Service (SaaS) paradigm and corresponding service-oriented technologies have simplified the development of larger, more complex software systems that routinely span administrative and organisational boundaries. These systems inhabit a complex operating environment with numerous threats to the dependability of service compositions. These threats include many system-level failures whose causes are difficult and time-consuming to determine. It is difficult to detect vulnerabilities to these failures prior to deployment of an application into production and applications are currently not well-equipped to handle them effectively. This results in lengthy downtimes of production systems and hence low availability. The goal of this PhD is to increase the availability of such systems by eliminating as many failures as possible before deployment and by assisting administrators to diagnose their causes more efficiently. We propose a novel monitoring technique and apply failure injection techniques that target these difficult failures and enable separate administrative domains to cooperate in handling them. Furthermore, we investigate the extent to which we can equip these systems to be self-diagnosing.
Keywords :
cloud computing; program diagnostics; service-oriented architecture; system recovery; complex software systems; failure injection techniques; monitoring technique; self-diagnosing; service oriented technologies; software as a service paradigm; system level failure; wide area distributed system availability; Availability; Machine learning; Monitoring; Protocols; Testing; Web services;
Conference_Titel :
Software Engineering, 2010 ACM/IEEE 32nd International Conference on
Conference_Location :
Cape Town
Print_ISBN :
978-1-60558-719-6
DOI :
10.1145/1810295.1810386