• DocumentCode
    625644
  • Title

    Implementing a Blocked Aasen´s Algorithm with a Dynamic Scheduler on Multicore Architectures

  • Author

    Ballard, Grey ; Becker, Daniel ; Demmel, J. ; Dongarra, Jack ; Druinsky, A. ; Peled, Inon ; Schwartz, Ofer ; Toledo, Sivan ; Yamazaki, Ichitaro

  • Author_Institution
    Univ. of California, Berkeley, Berkeley, CA, USA
  • fYear
    2013
  • fDate
    20-24 May 2013
  • Firstpage
    895
  • Lastpage
    907
  • Abstract
    Factorization of a dense symmetric indefinite matrix is a key computational kernel in many scientific and engineering simulations. However, there is no scalable factorization algorithm that takes advantage of the symmetry and guarantees numerical stability through pivoting at the same time. This is because such an algorithm exhibits many of the fundamental challenges in parallel programming like irregular data accesses and irregular task dependencies. In this paper, we address these challenges in a tiled implementation of a blocked Aasen´s algorithm using a dynamic scheduler. To fully exploit the limited parallelism in this left-looking algorithm, we study several performance enhancing techniques; e.g., parallel reduction to update a panel, tall-skinny LU factorization algorithms to factorize the panel, and a parallel implementation of symmetric pivoting. Our performance results on up to 48 AMD Opteron processors demonstrate that our implementation obtains speedups of up to 2.8 over MKL, while losing only one or two digits in the computed residual norms.
  • Keywords
    information retrieval; matrix decomposition; multiprocessing systems; numerical stability; parallel architectures; parallel programming; scheduling; AMD Opteron processors; MKL; blocked Aasen algorithm; computational kernel; computed residual norms; data access; dense symmetric indefinite matrix; dynamic scheduler; engineering simulations; irregular task dependencies; left-looking algorithm; multicore architectures; numerical stability; parallel programming; parallel reduction; scientific simulations; symmetric pivoting; tall-skinny LU factorization algorithms; Equations; Heuristic algorithms; Multicore processing; Numerical stability; Partitioning algorithms; Plasmas; Symmetric matrices;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on
  • Conference_Location
    Boston, MA
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4673-6066-1
  • Type

    conf

  • DOI
    10.1109/IPDPS.2013.98
  • Filename
    6569872