• DocumentCode
    1830709
  • Title

    The effect of timeout prediction and selection on wide area collective operations

  • Author

    Plank, James S. ; Wolski, Rich ; Allen, Matthew

  • Author_Institution
    Dept. of Comput. Sci., Tennessee Univ., Knoxville, TN, USA
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    320
  • Lastpage
    329
  • Abstract
    Failure identification is a fundamental operation concerning exceptional conditions that network programs must be able to perform. In this paper, we explore the use of timeouts to perform failure identification at the application level. We evaluate the use of static timeouts and of dynamic timeouts based on forecasts using the Network Weather Service. For this evaluation, we perform experiments on a wide-area collection of 31 machines distributed in eight institutions. Though the conclusions are limited to the collection of machines used, we observe that a single static timeout is not reasonable, even for a collection of similar machines over time. Dynamic timeouts perform roughly as well as the best static timeouts and, more importantly, they provide a single methodology for timeout determination that should be effective for wide-area applications
  • Keywords
    failure analysis; supervisory programs; wide area networks; Network Weather Service; dynamic timeouts; exceptional conditions; failure identification; network programs; static timeouts; timeout prediction; timeout selection; wide-area collective operations; Computer networks; Computer science; Fault diagnosis; Grid computing; High performance computing; Performance evaluation; Sockets; Software libraries; Software packages; Weather forecasting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Network Computing and Applications, 2001. NCA 2001. IEEE International Symposium on
  • Conference_Location
    Cambridge, MA
  • Print_ISBN
    0-7695-1432-4
  • Type

    conf

  • DOI
    10.1109/NCA.2001.962548
  • Filename
    962548