• DocumentCode
    3543232
  • Title

    Improving Linear Algebra Computation on NUMA Platforms through Auto-tuned Nested Parallelism

  • Author

    Cuenca, Javier ; García, Luis-Pedro ; Giménez, Domingo

  • Author_Institution
    Dept. de Ing. y Tecnol. de Comput., Univ. de Murcia, Murcia, Spain
  • fYear
    2012
  • fDate
    15-17 Feb. 2012
  • Firstpage
    66
  • Lastpage
    73
  • Abstract
    The most computationally demanding scientific and engineering problems are solved with large parallel systems. In some cases those systems are Non-Uniform Memory Access multiprocessors made up of a large number of cores which share a hierarchically organized memory. Basic linear algebra routines of the type of BLAS typically constitute the kernel of the computation for those problems, and the efficient use of these routines in those systems would contribute to a faster solution of a large range of scientific problems. Normally some multithreaded BLAS library optimized for the system is used, but when the number of cores increases the degradation in the performance is significant, and this can produce a misuse of the large, expensive systems. This paper empirically analyses the behaviour in large NUMA systems of the matrix multiplication of the BLAS library, and its combination with OpenMP to obtain nested parallelism. With the auto-tuning method proposed in this work, a reduction in the execution time is achieved with respect to the matrix multiplication of the library.
  • Keywords
    application program interfaces; file organisation; linear algebra; matrix multiplication; multi-threading; multiprocessing systems; parallel processing; NUMA platform; OpenMP; autotuned nested parallelism; autotuning method; basic linear algebra subroutines; execution time reduction; hierarchically organized memory; matrix multiplication; multithreaded BLAS library; nonuniform memory access multiprocessor; Computers; Instruction sets; Libraries; Mathematical model; Multicore processing; Parallel processing; Random access memory; auto-tuning; linear algebra; nested parallelism;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel, Distributed and Network-Based Processing (PDP), 2012 20th Euromicro International Conference on
  • Conference_Location
    Garching
  • ISSN
    1066-6192
  • Print_ISBN
    978-1-4673-0226-5
  • Type

    conf

  • DOI
    10.1109/PDP.2012.12
  • Filename
    6169531