• DocumentCode
    1998766
  • Title

    Towards Memory-Load Balanced Fast Fourier Transformations in Fine-Grain Execution Models

  • Author

    Chen Chen ; Yao Wu ; Zuckerman, Stephane ; Gao, Guang R.

  • Author_Institution
    Electr. & Comput. Eng. Dept., Univ. of Delaware, Newark, DE, USA
  • fYear
    2013
  • fDate
    20-24 May 2013
  • Firstpage
    1607
  • Lastpage
    1617
  • Abstract
    The code let model is a fine-grain dataflow-inspired program execution model that balances the parallelism and overhead of the runtime system. It plays an important role in terms of performance, scalability, and energy efficiency in exascale studies such as the DARPA UHPC project and the DOE X-Stack project. As an important application, the Fast Fourier Transform (FFT) has been deeply studied in fine-grain models, including the code let model. However, the existing work focuses on how fine-grain models achieve more balanced workload comparing to traditional coarse-grain models. In this paper, we make an important observation that the flexibility of execution order of tasks in fine-grain models improves utilization of memory bandwidth as well. We use the code let model and the FFT application as a case study to show that a proper execution order of tasks (or code lets) can significantly reduce memory contention and thus improve performance. We propose an algorithm that provides a heuristic guidance of the execution order of the code lets to reduce memory contention. We implemented our algorithm on the IBM Cyclops-64 architecture. Experimental results show that our algorithm improves up to 46% performance compared to a state-of-the-art coarse-grain implementation of the FFT application on Cyclops-64.
  • Keywords
    data flow computing; fast Fourier transforms; mathematics computing; multiprocessing systems; resource allocation; FFT application; IBM Cyclops-64 architecture; coarse-grain models; codelet model; fine-grain dataflow-inspired program execution model; fine-grain execution models; fine-grain models; many-core architecture; memory bandwidth utlization; memory contention reduction; memory-load balanced fast Fourier transformations; runtime system; Arrays; Computational modeling; Instruction sets; Memory management; Random access memory; Synchronization; FFT; execution model; fine-grain; memory bandwidth;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International
  • Conference_Location
    Cambridge, MA
  • Print_ISBN
    978-0-7695-4979-8
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2013.47
  • Filename
    6651057