• DocumentCode
    1490598
  • Title

    An Analytic Framework for Detailed Resource Profiling in Large and Parallel Programs and Its Application for Memory Use

  • Author

    Finkler, Ulrich

  • Author_Institution
    IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
  • Volume
    59
  • Issue
    3
  • fYear
    2010
  • fDate
    3/1/2010 12:00:00 AM
  • Firstpage
    358
  • Lastpage
    370
  • Abstract
    Profiling is an essential and widely used technique to understand the resource use of applications. For example, the memory use of large applications is becoming an important cost factor. Very large systems are typically sized to accommodate designated tasks, and thus, the price, as well as cache and TLB efficiency, depends significantly on the memory footprint of the target applications. Importantly, the increasing use of multicore systems magnifies the problem since memory use grows with the number of parallel tasks. Additionally, the presence of multiple tasks or threads makes the problem of correlating resource use to the program structure harder. Thus, tools that correlate resource use with program structure with quantitative error margins are essential for optimizing the resource use of complex software applications. While efficient tools for the profiling of execution time are available, the choices for detailed profiling of memory use or other hardware resources are very limited. We were unable to find tools that provided sufficiently accurate insight into, e.g., memory use without adding unacceptable overhead in memory use and execution time for the performance analysis of very large applications. In this paper, we present a highly efficient probabilistic method for profiling that provides detailed resource usage information R?(t) indexed by the full location descriptor ? (e.g., process id, thread id, and call chain) and time t. Importantly, we provide an analytical framework, which provides error estimates and allows to analyze and quantitatively optimize a wide variety of profiling scenarios. We employed the probabilistic approach to implement a memory profiling tool that adds minimal overhead and does not require recompilation or relinking. The tool provides the memory use M? (t) for all location descriptors ? over the execution time for single and multithreaded programs. Experimental results confirm that execution time and memory o- - verhead are less than 10 percent of the unprofiled, optimized execution. Importantly, the technique is sufficiently general to be applicable to profiling of other hardware resources as cache or TLB misses over time for all location descriptors with similarly low overhead and across multiple processes, threads, and processors.
  • Keywords
    cache storage; multi-threading; program diagnostics; resource allocation; TLB efficiency; analytic framework; cache efficiency; complex software applications; cost factor; detailed resource profiling; full location descriptor; memory footprint; memory use; multithreaded programs; parallel programs; performance analysis; probabilistic method; quantitative error margins; resource usage information; Application software; Computer errors; Costs; Hardware; Multicore processing; Performance analysis; Software tools; Yarn; Resource usage; call chain; memory usage; numerical.; probabilistic; profiling;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2009.149
  • Filename
    5276794