• DocumentCode
    1512360
  • Title

    Handling timing errors in distributed programs

  • Author

    Gordon, Aaron J. ; Finkel, Raphael A.

  • Author_Institution
    Dept. of Math. & Comput. Sci., Colorado Sch. of Mines, Golden, CO, USA
  • Volume
    14
  • Issue
    10
  • fYear
    1988
  • Firstpage
    1525
  • Lastpage
    1535
  • Abstract
    The authors describe a tool called TAP, which is defined to aid the programmer in discovering the causes of timing errors in running programs. TAP is similar to a postmortem debugger, using the history of interprocess communication to construct a timing graph, a directed graph where an edge joins node x to node y if event x directly precedes event y in time. The programmer can then use TAP to look at the graph to find the events that occurred in an unacceptable order. Because of the nondeterministic nature of distributed programs, the authors feel a history-keeping mechanism but always be active so that bugs can be dealt with as they occur. The goal is to collect enough information at run time to construct the timing graph if needed. Since it is always active, this mechanism must be efficient. The authors also describe experiments run using TAP and report the impact that TAP´s history-keeping mechanism has on the running time of various distributed programs.<>
  • Keywords
    directed graphs; distributed processing; program testing; software tools; TAP; directed graph; distributed programs; history-keeping mechanism; interprocess communication; postmortem debugger; timing errors; timing graph; Computer bugs; Computer errors; Computer science; Debugging; Degradation; Error correction; History; Operating systems; Programming profession; Timing;
  • fLanguage
    English
  • Journal_Title
    Software Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0098-5589
  • Type

    jour

  • DOI
    10.1109/32.6197
  • Filename
    6197