Title :
Sparse matrix-vector multiply on the Keystone II Digital Signal Processor
Author :
Yang Gao ; Fan Zhang ; Bakos, Jason D.
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of South Carolina, Columbia, SC, USA
Abstract :
In this paper we describe an implementation of sparse matrix-vector multiply (SpMV) on the Texas Instruments (TI) Keystone II architecture. The Keystone II is an eight core Digital Signal Processor (DSP) that offers floating point performance comparable to a desktop CPU while having a power envelope comparable to a mobile embedded CPU. This, combined with its integrated communication interfaces, potentially make it a scalable and efficient HPC processor technology. For this architecture, the key to achieving high computational efficiency is the careful use of its on-chip scratchpad memory. SpMV is a HPC kernel that is both memory bounded and has an irregular memory access pattern. When tuning this kernel, we found that using scratchpad can provide as much as 50% improvement in effective memory bandwidth as compared to using cache, but only with careful scratchpad allocation and run-time management. This includes selection of tile size, the mapping of arrays to specific on-chip memory structures, and the methods by which the DMA is performed in parallel with computation.
Keywords :
cache storage; digital signal processing chips; floating point arithmetic; memory architecture; parallel processing; sparse matrices; DMA; DSP; HPC processor technology; SpMV; Texas Instruments keystone II architecture; cache; computational efficiency; desktop CPU; floating point performance; integrated communication interface; irregular memory access pattern; keystone II digital signal processor; memory bandwidth; mobile embedded CPU; on-chip memory structure; on-chip scratchpad memory; power envelope; run-time management; scratchpad allocation; sparse matrix-vector multiply; tile size selection; Arrays; Bandwidth; Digital signal processing; Graphics processing units; Kernel; Random access memory; Sparse matrices;
Conference_Titel :
High Performance Extreme Computing Conference (HPEC), 2014 IEEE
Conference_Location :
Waltham, MA
Print_ISBN :
978-1-4799-6232-7
DOI :
10.1109/HPEC.2014.7040985