Title :
Whirlwind: Overload protection, fault-tolerance and self-tuning in an Internet services platform
Author :
Donald, Peter ; Singh, Samar ; Ghosh, Somnath
Author_Institution :
Sch. of Comput. Sci. & Electron. Eng., La Trobe Univ., Melbourne, VIC, Australia
Abstract :
Performance and availability are of critical importance when Internet services are integrated into emergency response management. Poor performance or service failure can result in severe economic, social or environmental cost. This paper presents Whirlwind, a software architecture that includes primitives for overload management and fault tolerance. A Whirlwind service is composed of a collection of isolated, independent, sequential processes that communicate through asynchronous message passing. If a process fails, the fault is contained within the process and a message is propagated to monitoring processes that may attempt to recover from the error. Processes are grouped with other processes that share similar resource, computation and concurrency requirements. Each group contains a scheduler and a thread pool that drives execution of processes within the group. The group may also define a message predicate that determines if a message posted to a process in the group is accepted. A rejected message typically signals overload and allows the application the chance to perform load shedding and avoid over commitment of resources. Principals are shared between processes in different groups, enabling consistent prioritization and admission control across groups. The resource management policies are typically driven by feedback loops that monitor resource availability and system performance, and adjust tuning parameters to meet performance goals. Whirlwind evolved over a period of five fire seasons as part of emergency response software in Victoria, Australia.
Keywords :
Internet; message passing; multi-threading; software architecture; software development management; software fault tolerance; Australia; Internet services; Victoria; Whirlwind; admission control; asynchronous message passing; emergency response management; emergency response software; fault tolerance; load shedding; monitoring processes; overload protection; resource availability; scheduler; self tuning; software architecture; thread pool; Availability; Condition monitoring; Costs; Disaster management; Environmental economics; Fault tolerance; Message passing; Protection; Software architecture; Web and internet services; concurrency; fault-tolerance; overload; self-tuning; software architecture;
Conference_Titel :
Communications (MICC), 2009 IEEE 9th Malaysia International Conference on
Conference_Location :
Kuala Lumpur
Print_ISBN :
978-1-4244-5531-7
DOI :
10.1109/MICC.2009.5431539