• DocumentCode
    2437758
  • Title

    Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems

  • Author

    Brito, Andrey ; Fetzer, Christof ; Felber, Pascal

  • Author_Institution
    Syst. Eng. Group, Tech. Univ. Dresden, Dresden, Germany
  • fYear
    2009
  • fDate
    22-26 June 2009
  • Firstpage
    173
  • Lastpage
    182
  • Abstract
    Event stream processing (ESP) applications target the real-time processing of huge amounts of data. Events traverse a graph of stream processing operators where the information of interest is extracted. As these applications gain popularity, the requirements for scalability, availability, and dependability increase. In terms of dependability and availability, many applications require a precise recovery, i.e., a guarantee that the outputs during and after a recovery would be the same as if the failure that triggered recovery had never occurred. Existing solutions for precise recovery induce prohibitive latency costs, either by requiring continuous checkpoint or logging (in a passive replication approach) or perfect synchronization between replicas executing the same operations (in an active replication approach). We introduce a novel technique to guarantee precise recovery for ESP applications while minimizing the latency costs as compared to traditional approaches. The technique minimizes latencies via speculative execution in a distributed system. In terms of scalability, the key component of our approach is a modified software transactional memory that provides not only the speculation capabilities but also optimistic parallelization for costly operations.
  • Keywords
    checkpointing; distributed processing; fault tolerant computing; graph theory; storage allocation; synchronisation; system monitoring; transaction processing; checkpointing; event stream processing; fault-tolerant distributed stream processing system recovery; graph traversal; latency cost minimization; optimistic parallelization; software transactional memory; synchronization; Checkpointing; Computer crashes; Costs; Data engineering; Data mining; Delay; Distributed computing; Electrostatic precipitators; Fault tolerant systems; Scalability; Event Stream Processing; Fault-tolerance; Parallel Computing; Software Transactional Memory;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems, 2009. ICDCS '09. 29th IEEE International Conference on
  • Conference_Location
    Montreal, QC
  • ISSN
    1063-6927
  • Print_ISBN
    978-0-7695-3659-0
  • Electronic_ISBN
    1063-6927
  • Type

    conf

  • DOI
    10.1109/ICDCS.2009.35
  • Filename
    5158422