• DocumentCode
    2453872
  • Title

    Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures

  • Author

    Chu, Michael ; Ravindra, Rajiv ; Mahlke, Scott

  • Author_Institution
    Univ. of Michigan, Ann Arbor
  • fYear
    2007
  • fDate
    1-5 Dec. 2007
  • Firstpage
    369
  • Lastpage
    380
  • Abstract
    The recent design shift towards multicore processors has spawned a significant amount of research in the area of program parallelization. The future abundance of cores on a single chip requires programmer and compiler intervention to increase the amount of parallel work possible. Much of the recent work has fallen into the areas of coarse-grain parallelization: new programming models and different ways to exploit threads and data-level parallelism. This work focuses on a complementary direction, improving performance through automated fine-grain parallelization. The main difficulty in achieving a performance benefit from fine-grain parallelism is the distribution of data memory accesses across the data caches of each core. Poor choices in the placement of data accesses can lead to increased memory stalls and low resource utilization. We propose a profile-guided method for partitioning memory accesses across distributed data caches. First, a profile determines affinity relationships between memory accesses and working set characteristics of individual memory operations in the program. Next, a program-level partitioning of the memory operations is performed to divide the memory accesses across the data caches. As a result, the data accesses are proactively dispersed to reduce memory stalls and improve computation parallelization. A final detailed partitioning of the computation instructions is performed with knowledge of the cache location of their associated data. Overall, our data partitioning reduces stall cycles by up to 51 % versus data-incognizant partitioning, and has an overall speedup average of 30% over a single core processor.
  • Keywords
    cache storage; graph theory; multiprocessing systems; parallel processing; affinity relationship; cache location; data access graph; data access partitioning; data-level parallelism; distributed data cache; fine-grain parallelization; memory access partitioning; memory stall; multicore architecture; multicore processors; profile-guided method; program parallelization; resource utilization; Computer architecture; Laboratories; Microarchitecture; Multicore processing; Multiprocessor interconnection networks; Parallel processing; Parallel programming; Program processors; Programming profession; Yarn;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Microarchitecture, 2007. MICRO 2007. 40th Annual IEEE/ACM International Symposium on
  • Conference_Location
    Chicago, IL
  • ISSN
    1072-4451
  • Print_ISBN
    978-0-7695-3047-5
  • Electronic_ISBN
    1072-4451
  • Type

    conf

  • DOI
    10.1109/MICRO.2007.15
  • Filename
    4408269