Title :
: Spatial Processors Interconnected for Concurrent Execution for Accelerating the SPICE Circuit Simulator Using an FPGA
Author :
Kapre, Nachiket ; DeHon, André
Author_Institution :
Dept. of Electr. & Electron. Eng., Imperial Coll. London, London, UK
Abstract :
Spatial processing of sparse, irregular, double-precision floating-point computation using a single field-programmable gate array (FPGA) enables up to an order of magnitude speedup (mean 2.8× speedup) over a conventional microprocessor for the SPICE circuit simulator. We develop a parallel, FPGA-based, heterogeneous architecture customized for accelerating the SPICE simulator to deliver this speedup. To properly parallelize the complete simulator, we decompose SPICE into its three constituent phases-model evaluation, sparse matrix-solve, and iteration control-and customize a spatial architecture for each phase independently. Our heterogeneous FPGA organization mixes very large instruction word, dataflow and streaming architectures into a cohesive, unified design to match the parallel patterns exposed by our programming framework. This FPGA architecture is able to outperform conventional processors due to a combination of factors, including high utilization of statically-scheduled resources, low-overhead dataflow scheduling of fine-grained tasks, and streaming, overlapped processing of the control algorithms. We demonstrate that we can independently accelerate model evaluation by a mean factor of 6.5 × (1.4-23×) across a range of nonlinear device models and matrix solve by 2.4×(0.6-13×) across various benchmark matrices while delivering a mean combined speedup of 2.8×(0.2-11×) for the composite design when comparing a Xilinx Virtex-6 LX760 (40 nm) with an Intel Core i7 965 (45 nm). We also estimate mean energy savings of 8.9× (up to 40.9×) when comparing a Xilinx Virtex-6 LX760 with an Intel Core i7 965.
Keywords :
SPICE; circuit simulation; field programmable gate arrays; floating point arithmetic; integrated circuit interconnections; microprocessor chips; sparse matrices; FPGA; Intel Core i7 965; SPICE circuit simulator acceleration; Xilinx Virtex-6 LX760; benchmark matrix; dataflow architecture; floating-point computation; heterogeneous architecture; iteration control phasing; low-overhead dataflow scheduling; mean energy saving estimation; microprocessor; model evaluation phasing; nonlinear device model; size 40 nm; size 45 nm; sparse matrix-solve phasing; spatial processor interconnected for concurrent execution; statically-scheduled resource utilization; streaming architecture; Computational modeling; Field programmable gate arrays; Integrated circuit modeling; Mathematical model; Runtime; SPICE; Sparse matrices; Parallelism; reconfigurable logic; simulation;
Journal_Title :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on
DOI :
10.1109/TCAD.2011.2173199