Title :
Components and Analysis of Disaster Tolerant Computing
Author :
Lawler, Chad M. ; Harper, Michael A. ; Thornton, Mitchell A.
Author_Institution :
Data Return, LLC, Irving, TX
Abstract :
This paper provides a review of the components of disaster tolerant computing and communications and reviews the current state in light of recent man-made terrorist events. The paper examines the relationships between disaster tolerant systems, information technology (IT) application availability and executive level management visibility necessary for successful system operations in the event of a catastrophic disaster; one which causes rapid, almost simultaneous, multiple points of failure in a system, as well as a single points of failure that escalate into wide catastrophic system failures. The technology, process and human resource challenges of traditional disaster recovery approaches to disaster preparedness are outlined. The risks of IT application downtime attributable to the increasing dependence on critical information technology applications operating in distributed and unbounded networks are explored. A general method for disaster tolerance is proposed which mitigates unplanned downtime through a disciplined approach of IT infrastructure design based on redundancy and distributed components with special attention given to the ability of executive level management to comprehend the value of uptime of an application and make appropriate capital investment. The importance of executive visibility into the system wide impact of downtime and the resultant effects on the costs of downtime of critical systems is explored.
Keywords :
computer network management; computer network reliability; fault tolerant computing; system recovery; disaster recovery approaches; disaster tolerant communications; disaster tolerant computing; distributed networks; executive level management; information technology; unbounded networks; Availability; Business continuity; Computer science; Costs; Data engineering; Disaster management; Fault tolerant systems; Information technology; Military computing; Protection; Application Downtime; BCP; Business Continuity Planning; Cost of Downtime; DR; Disaster Recovery; Disaster Tolerance; Disaster Tolerant Computing and Communications; Executive Visibility; Survivability; Value of Uptime;
Conference_Titel :
Performance, Computing, and Communications Conference, 2007. IPCCC 2007. IEEE Internationa
Conference_Location :
New Orleans, LA
Print_ISBN :
1-4244-1138-6
Electronic_ISBN :
1097-2641
DOI :
10.1109/PCCC.2007.358917