• DocumentCode
    3080274
  • Title

    A tile size selection analysis for blocked array layouts

  • Author

    Athanasaki, Evangelia ; Koziris, Nectarios ; Tsanakas, Panayiotis

  • Author_Institution
    Sch. of Electr. & Comput. Eng., Nat. Tech. Univ. of Athens, Greece
  • fYear
    2005
  • fDate
    13 Feb. 2005
  • Firstpage
    70
  • Lastpage
    80
  • Abstract
    Efficient use of the memory hierarchy is essential for good performance due to the ever-increasing gap between processor and memory speed. Program transformations such as loop tiling have been shown to be an effective approach to improving locality and cache exploitation, especially for dense matrix scientific computations. In conjunction with tiling, several experimental studies have been conducted on blocked data layouts, as a data transformation technique used to boost the cache performance. The stability of the achieved performance improvements are heavily dependent on the appropriate selection of tile sizes, taking into account the actual layout of the arrays in memory. In this paper, we first provide a theoretical analysis for the cache and TLB performance of blocked data layouts. According to this analysis, the optimal tile size that maximizes L1 cache utilization, should completely fit in the L1 cache, to avoid any interference misses. We prove that when applying optimization techniques, such as register assignment, array alignment, prefetching and loop unrolling, tile sizes equal to L1 capacity, offer better cache utilization, even for loop bodies that access more than just one array. Increased self-or/and cross-interference misses are now tolerated through prefetching. Such larger tiles also reduce lost CPU cycles due to less mispredicted branches. Results are validated through simulations and actual benchmarks on various modern platforms.
  • Keywords
    cache storage; optimising compilers; program control structures; L1 cache; blocked array layout; cache miss analysis; data transformation technique; optimization technique; prefetching; tile size selection analysis; translation lookaside buffer; Delay; Interference; Laboratories; Microprocessors; Performance analysis; Prefetching; Registers; Stability; Systems engineering and theory; Tiles; blocked array layouts; cache miss analysis; tile selection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Interaction between Compilers and Computer Architectures, 2005. INTERACT-9. 9th Annual Workshop on
  • ISSN
    1550-6207
  • Print_ISBN
    0-7695-2321-8
  • Type

    conf

  • DOI
    10.1109/INTERACT.2005.1
  • Filename
    1423142