• DocumentCode
    3008604
  • Title

    Embedded Gossip: Lightweight Online Measurement for Large-Scale Applications

  • Author

    Zhu, Wenbin ; Bridges, Patrick G. ; Maccabe, Arthur B.

  • Author_Institution
    University of New Mexico
  • fYear
    2007
  • fDate
    25-27 June 2007
  • Firstpage
    58
  • Lastpage
    58
  • Abstract
    For large-scale parallel applications, lightweight online monitoring can enable a wide range of online adaptations, including load balancing, power management, and progress monitoring. The processing and monitoring overhead of centralized global tracing techniques make them unsuitable for such tasks. Purely local tools, on the other hand, fail to provide the global information necessary for many desirable online adaptations of large-scale applications. In this paper, we describe a novel distributed online measurement method for large-scale applications called Embedded Gossip (EG). EG works by piggybacking performance information about application behavior on existing application messages and merging received information with previously known data in a fashion customized to the needs of a particular monitoring task. EG thus provides each process with both local and global views of application behavior with low overhead. To illustrate the capabilities of Embedded Gossip, we also show that it disseminates global information in a timely fashion for a wide range of monitoring tasks, including critical path profiling, workload imbalance monitoring, and progress monitoring. This global information has a wide range of potential uses, including imbalance detection for load balancing and energy management tools, progress monitoring for batch schedulers, and a wide range of other performance debugging and optimization techniques.
  • Keywords
    Application software; Bridges; Computer architecture; Computer science; Computerized monitoring; Condition monitoring; Energy management; Laboratories; Large-scale systems; Load management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems, 2007. ICDCS '07. 27th International Conference on
  • Conference_Location
    Toronto, ON, Canada
  • ISSN
    1063-6927
  • Print_ISBN
    0-7695-2837-3
  • Electronic_ISBN
    1063-6927
  • Type

    conf

  • DOI
    10.1109/ICDCS.2007.107
  • Filename
    4268211