• DocumentCode
    3475757
  • Title

    Fault-tolerance in a distributed management system: a case study

  • Author

    Smeikal, Robert ; Goeschka, Karl M.

  • Author_Institution
    Vienna Univ. of Technol., Austria
  • fYear
    2003
  • fDate
    3-10 May 2003
  • Firstpage
    478
  • Lastpage
    483
  • Abstract
    Our case study provides the most important conceptual lessons learned from the implementation of a Distributed Telecommunication Management System (DTMS), which controls a networked voice communication system. Major requirements for the DTMS are fault-tolerance against site or network failures, transactional safety, and reliable persistence. In order to provide distribution and persistence both transparently and fault-tolerant we introduce a two-layer architecture facilitating an asynchronous replication algorithm. Among the lessons learned are: component based software engineering poses a significant initial overhead but is worth it in the long term; a fault-tolerant naming service is a key requirement for fail-safe distribution; the reasonable granularity for persistence and concurrency control is one whole object; asynchronous replication on the database layer is superior to synchronous replication on the instance level in terms of robustness and consistency; semi-structured persistence with XML has drawbacks regarding consistency, performance and convenience; in contrast to an arbitrarily meshed object model, a accentuated hierarchical structure is more robust and feasible; a query engine has to provide a means for navigation through the object model; finally the propagation of deletion operation becomes more complex in an object-oriented model. By incorporating these lessons learned we are well underway to provide a highly available, distributed platform for persistent object systems.
  • Keywords
    distributed databases; distributed object management; object-oriented programming; software fault tolerance; voice communication; XML; asynchronous replication algorithm; component based software engineering; distributed telecommunication management system; fault-tolerance; naming service; networked voice communication system; object oriented programming; two-layer architecture; Communication system control; Control systems; Fault tolerance; Fault tolerant systems; Object oriented modeling; Robust control; Safety; Telecommunication control; Telecommunication network management; Telecommunication network reliability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering, 2003. Proceedings. 25th International Conference on
  • ISSN
    0270-5257
  • Print_ISBN
    0-7695-1877-X
  • Type

    conf

  • DOI
    10.1109/ICSE.2003.1201225
  • Filename
    1201225