• DocumentCode
    2958467
  • Title

    Hybrid Static/dynamic Scheduling for Already Optimized Dense Matrix Factorization

  • Author

    Donfack, Simplice ; Grigori, Laura ; Gropp, William D. ; Kale, Vivek

  • Author_Institution
    INRIA Saclay-Ile de France, Univ. Paris-Sud 11, Orsay, France
  • fYear
    2012
  • fDate
    21-25 May 2012
  • Firstpage
    496
  • Lastpage
    507
  • Abstract
    We present the use of a hybrid static/dynamic scheduling strategy of the task dependency graph for direct methods used in dense numerical linear algebra. This strategy provides a balance of data locality, load balance, and low dequeue overhead. We show that the usage of this scheduling in communication avoiding dense factorization leads to significant performance gains. On a 48 core AMD Opteron NUMA machine, our experiments show that we can achieve up to 64% improvement over a version of CALU that uses fully dynamic scheduling, and up to 30% improvement over the version of CALU that uses fully static scheduling. On a 16-core Intel Xeon machine, our hybrid static/dynamic scheduling approach is up to 8% faster than the version of CALU that uses a fully static scheduling or fully dynamic scheduling. Our algorithm leads to speedups over the corresponding routines for computing LU factorization in well known libraries. On the 48 core AMD NUMA machine, our best implementation is up to 110% faster than MKL, while on the 16 core Intel Xeon machine, it is up to 82% faster than MKL. Our approach also shows significant speedups compared with PLASMA on both of these systems.
  • Keywords
    graph theory; linear algebra; matrix decomposition; multiprocessing systems; processor scheduling; resource allocation; 16-core Intel Xeon machine; AMD Opteron NUMA machine; CALU; LU factorization; PLASMA; data locality; dense numerical linear algebra; dequeue overhead; fully dynamic scheduling; fully static scheduling; hybrid static/dynamic scheduling; load balance; optimized dense matrix factorization; task dependency graph; Computer architecture; Dynamic scheduling; Heuristic algorithms; Instruction sets; Layout; Libraries; Processor scheduling; LU factorization; communication-avoiding; dynamic scheduling; numerical linear algebra;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International
  • Conference_Location
    Shanghai
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4673-0975-2
  • Type

    conf

  • DOI
    10.1109/IPDPS.2012.53
  • Filename
    6267853