• DocumentCode
    949083
  • Title

    Recursive array layouts and fast matrix multiplication

  • Author

    Chatterjee, S. ; Lebeck, A.R. ; Patnala, P.K. ; Thottethodi, M.

  • Author_Institution
    IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
  • Volume
    13
  • Issue
    11
  • fYear
    2002
  • Firstpage
    1105
  • Lastpage
    1123
  • Abstract
    The performance of both serial and parallel implementations of matrix multiplication is highly sensitive to memory system behavior. False sharing and cache conflicts cause traditional column-major or row-major array layouts to incur high variability in memory system performance as matrix size varies. This paper investigates the use of recursive array layouts to improve performance and reduce variability. Previous work on recursive matrix multiplication is extended to examine several recursive array layouts and three recursive algorithms: standard matrix multiplication and the more complex algorithms of Strassen (1969) and Winograd. While recursive layouts significantly outperform traditional layouts (reducing execution times by a factor of 1.2-2.5) for the standard algorithm, they offer little improvement for Strassen´s and Winograd´s algorithms. For a purely sequential implementation, it is possible to reorder computation to conserve memory space and improve performance between 10 percent and 20 percent. Carrying the recursive layout down to the level of individual matrix elements is shown to be counterproductive; a combination of recursive layouts down to canonically ordered matrix tiles instead yields higher performance. Five recursive layouts with successively increasing complexity of address computation are evaluated and it is shown that addressing overheads can be kept in control even for the most computationally demanding of these layouts.
  • Keywords
    cache storage; data structures; mathematics computing; matrix multiplication; parallel algorithms; performance evaluation; cache conflicts; complexity; execution times; false sharing; fast matrix multiplication; memory system behavior; memory system performance; parallel processing; recursive algorithms; recursive array layouts; recursive layouts; Data structures; Interference; Linear algebra; Matrix decomposition; Parallel processing; Robust control; System performance;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2002.1058095
  • Filename
    1058095