• DocumentCode
    628208
  • Title

    Automating the debugging of datacenter applications with ADDA

  • Author

    Zamfir, Cristian ; Altekar, Gautam ; Stoica, Ion

  • Author_Institution
    Sch. of Comput. & Commun. Sci., Ecole Polytech. Fed. de Lausanne (EPFL), Lausanne, Switzerland
  • fYear
    2013
  • fDate
    24-27 June 2013
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    Debugging data-intensive distributed applications running in datacenters is complex and time-consuming because developers do not have practical ways of deterministically replaying failed executions. The reason why building such tools is hard is that non-determinism that may be tolerable on a single node is exacerbated in large clusters of interacting nodes, and datacenter applications produce terabytes of intermediate data exchanged by nodes, thus making full input recording infeasible. We present ADDA, a replay-debugging system for datacenters that has lower recording and storage overhead than existing systems. ADDA is based on two techniques: First, ADDA provides control plane determinism, leveraging our observation that many typical datacenter applications consist of a separate “control plane” and “data plane”, and most bugs reside in the former. Second, ADDA does not record “data plane” inputs, instead it synthesizes them during replay, starting from the application´s external inputs, which are typically persisted in append-only storage for reasons unrelated to debugging. We evaluate ADDA and show that it deterministically replays real-world failures in Hypertable and Memcached.
  • Keywords
    computer centres; distributed programming; failure analysis; program debugging; storage management; ADDA; Hypertable; Memcached; automatic datacenter application debugging; control plane determinism; data intensive distributed application debugging; data plane; failure analysis; interacting node; recording overhead; replay debugging system; storage overhead; Availability; Computer bugs; Debugging; Distributed databases; Hardware; Protocols; Servers; data-center; debugging; record-replay; reliability; storage;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Dependable Systems and Networks (DSN), 2013 43rd Annual IEEE/IFIP International Conference on
  • Conference_Location
    Budapest
  • ISSN
    1530-0889
  • Print_ISBN
    978-1-4673-6471-3
  • Type

    conf

  • DOI
    10.1109/DSN.2013.6575303
  • Filename
    6575303