• DocumentCode
    3525720
  • Title

    Optimizing software data prefetches with rotating registers

  • Author

    Doshi, Gautam ; Krishnaiyer, Rakesh ; Muthukumar, Kalyan

  • Author_Institution
    Intel Corp., Santa Clara, CA, USA
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    257
  • Lastpage
    267
  • Abstract
    Software data prefetching is a well-known technique to improve the performance of programs that suffer many cache misses at several levels of memory hierarchy. However, it has significant overhead in terms of increased code size, additional instructions, and possibly increased memory bus traffic due to redundant prefetches. This paper presents two novel methods to reduce the overhead of software data prefetching and improve the program performance by optimized prefetch scheduling. These methods exploit the availability of rotating registers and predication in architectures such as the ItaniumTM architecture. The methods (I) minimize redundant prefetches, (2) reduce the number of issue slots needed for prefetch instructions, and (3) avoid branch mispredict penalties - all with minimal code size increase. Compared to traditional data prefetching techniques, these methods (i) do not require loop unrolling, (ii) do not require predicate computations and (iii) require fewer machine resources. One of these methods has been implemented in the Intel Production Compiler for the ItaniumTM processor. This technique is compared with traditional approaches for software prefetching and experimental results are presented based on the floating-point benchmark suite of CPU2000
  • Keywords
    computer architecture; floating point arithmetic; optimising compilers; software performance evaluation; storage management; CPU2000; Intel Production Compiler; Itanium architecture; branch mispredict penalties; cache misses; code size; code size increase; floating-point benchmark suite; issue slots; memory bus traffic; memory hierarchy levels; optimized prefetch scheduling; overhead; predication; prefetch instructions; program performance; redundant prefetches; rotating registers; software data prefetching optimization; Availability; Computer architecture; Delay; Educational institutions; Optimization methods; Prefetching; Production; Registers; Scheduling; Software performance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Architectures and Compilation Techniques, 2001. Proceedings. 2001 International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1089-796X
  • Print_ISBN
    0-7695-1363-8
  • Type

    conf

  • DOI
    10.1109/PACT.2001.953306
  • Filename
    953306