• DocumentCode
    3706760
  • Title

    Communication Avoiding Power Scaling

  • Author

    John Leidel;Yong Chen

  • Author_Institution
    Whitacre Coll. of Eng., Texas Tech Univ., Lubbock, TX, USA
  • fYear
    2015
  • Firstpage
    166
  • Lastpage
    174
  • Abstract
    Recent system on chip (SoC) techniques have permitted the continued scaling of core densities at a rate sufficient to track Moore´s Law. However, this continued increase in transistor density has warranted new hardware features in order to sufficiently scale the degree of on-chip concurrency. Features such as complex multi-level caches, hierarchical core configurations and hardware-assisted threading have increased the overall energy requirements of the SoC and decreased the programmer´s ability to realize efficient scaling. This increase in overall system power requirements has resulted in research and development activities associated with hardware techniques such as dynamic frequency scaling and software techniques such as power-aware, fine-grained thread scheduling algorithms. We present the basis for a third area of research: power-scaling algorithmic complexity. The goal of this research focus is to describe techniques by which one may weigh the timing and power derivatives of competitive parallel algorithms in order to provide data necessary to make algorithmic choices based upon both the projected performance and the expected power requirements. This work presents a model and associated technique to describe the relative energy performance scaling characteristics of parallel and mixed parallel-sequential algorithms. The model and equations are then applied to a study of matrix multiplication techniques on a symmetric multiprocessing platform. We utilize a tuned Open BLAS blocking matrix multiplication, a classic parallel Strassen-Winograd technique and a Communication Avoiding Parallel Strassen (CAPS) technique to elicit the relative energy performance scaling on our aforementioned platform. In doing so, we show that while a blocking matrix multiplication may provide the highest potential performance on our platform, both the Strassen and CAPS techniques have ideal energy scaling properties. Furthermore, we show that by reducing the communication requirements of Strassen multiplication, we have the ability to gain a slight improvement in power scaling over traditional Strassen implementations.
  • Keywords
    "Power measurement","Heuristic algorithms","Mathematical model","System-on-chip","Computer architecture","Monitoring","Hardware"
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing Workshops (ICPPW), 2015 44th International Conference on
  • ISSN
    1530-2016
  • Type

    conf

  • DOI
    10.1109/ICPPW.2015.26
  • Filename
    7349908