• DocumentCode
    1016545
  • Title

    Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization

  • Author

    Kurzak, Jakub ; Buttari, Alfredo ; Dongarra, Jack

  • Author_Institution
    Dept. of Electr. Eng. & Comput. Sci., Tennessee Univ., Knoxville, TN
  • Volume
    19
  • Issue
    9
  • fYear
    2008
  • Firstpage
    1175
  • Lastpage
    1186
  • Abstract
    The Sony/Toshiba/IBM (STI) CELL processor introduces pioneering solutions in processor architecture. At the same time it presents new challenges for the development of numerical algorithms. One is effective exploitation of the differential between the speed of single and double precision arithmetic; the other is efficient parallelization between the short vector SIMD cores. The first challenge is addressed by utilizing the well known technique of iterative refinement for the solution of a dense symmetric positive definite system of linear equations, resulting in a mixed-precision algorithm, which delivers double precision accuracy, while performing the bulk of the work in single precision. The main contribution of this paper lies in addressing the second challenge by successful thread-level parallelization, exploiting fine-grained task granularity and a lightweight decentralized synchronization. The implementation of the computationally intensive sections gets within 90 percent of peak floating point performance, while the implementation of the memory intensive sections reaches within 90 percent of peak memory bandwidth. On a single CELL processor, the algorithm achieves over 170~Gflop/s when solving a symmetric positive definite system of linear equation in single precision and over 150~Gflop/s when delivering the result in double precision accuracy.
  • Keywords
    instruction sets; parallel algorithms; synchronisation; task analysis; Cholesky factorization; Sony/Toshiba/IBM CELL processor; fine-grained task granularity; lightweight decentralized synchronization; linear equations; processor architecture; short-vector single-instruction multiple-data cores; Linear systems; Numerical Linear Algebra; Parallel algorithms;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2007.70813
  • Filename
    4407694