DocumentCode
3704251
Title
Energy-Efficient Sorting with the Distributed Memory Architecture ePUMA
Author
Andréas ;Joar Sohl;Dake Liu
Author_Institution
Dept. of Electr. Eng., Linkoping Univ., Linkoping, Sweden
Volume
3
fYear
2015
Firstpage
116
Lastpage
123
Abstract
This paper presents the novel heterogeneous DSP architecture ePUMA and demonstrates its features through an implementation of sorting of larger data sets. We derive a sorting algorithm with fixed-size merging tasks suitable for distributed memory architectures, which allows very simple scheduling and predictable data-independent sorting time. The implementation on ePUMA utilizes the architecture´s specialized compute cores and control cores, and local memory parallelism, to separate and overlap sorting with data access and control for close to stall-free sorting. Penalty-free unaligned and out-of-order local memory access is used in combination with proposed application-specific sorting instructions to derive highly efficient local sorting and merging kernels used by the system-level algorithm. Our evaluation shows that the proposed implementation can rival the sorting performance of high-performance commercial CPUs and GPUs, with two orders of magnitude higher energy efficiency, which would allow high-performance sorting on low-power devices.
Keywords
"Sorting","Gold","Digital signal processing","Registers","Memory management","Hardware"
Publisher
ieee
Conference_Titel
Trustcom/BigDataSE/ISPA, 2015 IEEE
Type
conf
DOI
10.1109/Trustcom.2015.620
Filename
7345636
Link To Document