• DocumentCode
    3585611
  • Title

    Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization

  • Author

    Siddhartha ; Kapre, Nachiket

  • Author_Institution
    Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
  • fYear
    2014
  • Firstpage
    252
  • Lastpage
    255
  • Abstract
    Performance of FPGA-based token dataflow architectures is often limited by the long tail distribution of parallelism in the compute paths of the dataflow graphs. This is known to limit speedup of dataflow processing of Sparse LU factorization to only 3-10x over CPUs. One reason behind the limitations is the serialization penalty of processing high-fanout nodes in the dataflow graph on traditional dataflow processing architectures. In this paper, we show how to perform one-time static fanout decomposition and selective node replication transformations to input dataflow graphs. These transformations are one-time static compute costs that are typically amortized over millions of iterations. For dataflow graphs extracted for sparse LU factorization, we demonstrate up to 2.3x speedup (1.2x geomean average) with this technique across a range of benchmark problems.
  • Keywords
    data flow graphs; field programmable gate arrays; logic design; optimisation; FPGA-based sparse LU factorization; FPGA-based token dataflow architectures; compute paths; dataflow graphs; dataflow processing architectures; fanout decomposition dataflow optimizations; high-fanout nodes; node replication transformations; one-time static fanout decomposition; serialization penalty; Benchmark testing; Circuit simulation; Computer architecture; Hardware; Optimization; Parallel processing; Sparse matrices;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Field-Programmable Technology (FPT), 2014 International Conference on
  • Print_ISBN
    978-1-4799-6244-0
  • Type

    conf

  • DOI
    10.1109/FPT.2014.7082787
  • Filename
    7082787