DocumentCode
579755
Title
Efficient Sorting on the Tilera Manycore Architecture
Author
Morari, Alessandro ; Tumeo, Antonino ; Villa, Oreste ; Secchi, Simone ; Valero, Mateo
Author_Institution
Pacific Northwest Nat. Lab., Richland, WA, USA
fYear
2012
fDate
24-26 Oct. 2012
Firstpage
171
Lastpage
178
Abstract
We present an efficient implementation of the radix sort algorithm for the Tilera TILEPro64 processor. The TILEPro64 is one of the first successful commercial manycore processors. It is composed of 64 tiles interconnected through multiple fast Networks-on-chip and features a fully coherent, shared distributed cache. The architecture has a large degree of flexibility, and allows various optimization strategies. We describe how we mapped the algorithm to this architecture. We present an in-depth analysis of the optimizations for each phase of the algorithm with respect to the processor´s sustained performance. We discuss the overall throughput reached by our radix sort implementation (up to 132 MK/s) and show that it provides comparable or better performance-per-watt with respect to state-of-the art implementations on x86 processors and graphic processing units.
Keywords
graphics processing units; network-on-chip; optimisation; shared memory systems; sorting; Tilera TILEPro64 processor; Tilera manycore architecture; commercial manycore processors; graphic processing units; networks-on-chip; optimization strategies; radix sort algorithm; shared distributed cache; Bandwidth; Computer architecture; Histograms; Instruction sets; Optimization; Sorting; Tiles;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Architecture and High Performance Computing (SBAC-PAD), 2012 IEEE 24th International Symposium on
Conference_Location
New York, NY
ISSN
1550-6533
Print_ISBN
978-1-4673-4790-7
Type
conf
DOI
10.1109/SBAC-PAD.2012.41
Filename
6374786
Link To Document