DocumentCode :
24072
Title :
Achieving High-Performance On-Chip Networks With Shared-Buffer Routers
Author :
Tran, A.T. ; Baas, Bevan M.
Author_Institution :
Electr. & Comput. Eng. (ECE) Dept., Univ. of California at Davis, Davis, CA, USA
Volume :
22
Issue :
6
fYear :
2014
fDate :
Jun-14
Firstpage :
1391
Lastpage :
1403
Abstract :
On-chip routers typically have buffers dedicated to their input or output ports for temporarily storing packets in case contention occurs on output physical channels. Buffers, unfortunately, consume significant portions of router area and power budgets. While running a traffic trace, however, not all input ports of routers have incoming packets needed to be transferred simultaneously. Therefore, a large number of buffer queues in the network are empty and other queues are mostly busy. This observation motivates us to design router architecture with shared queues (RoShaQ), router architecture that maximizes buffer utilization by allowing the sharing multiple buffer queues among input ports. Sharing queues, in fact, makes using buffers more efficient hence is able to achieve higher throughput when the network load becomes heavy. On the other side, at light traffic load, our router achieves low latency by allowing packets to effectively bypass these shared queues. Experimental results on a 65-nm CMOS standard-cell process show that over synthetic traffics RoShaQ has 17% less latency and 18% higher saturation throughput than a typical virtualchannel (VC) router. Because of its higher performance, RoShaQ consumes 9% less energy per transferred packet than VC router given the same buffer space capacity. Over real multitask applications and E3S embedded benchmarks using near-optimal NMAP mapping algorithm, RoShaQ has 32% lower latency than VC router and targeting the same application throughput with 30% lower energy per packet.
Keywords :
CMOS integrated circuits; buffer circuits; network routing; network-on-chip; CMOS standard-cell process; E3S embedded benchmarks; RoShaQ; buffer utilization; light traffic load; multiple buffer queues; multitask applications; near-optimal NMAP mapping; network load; on-chip networks; on-chip routers; power budgets; router architecture; router area; shared queues; shared-buffer routers; size 65 nm; synthetic traffics; Application mapping; networks on-chip; router architecture; shared-buffer; synthetic traffics; synthetic traffics.;
fLanguage :
English
Journal_Title :
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-8210
Type :
jour
DOI :
10.1109/TVLSI.2013.2268548
Filename :
6553191
Link To Document :
بازگشت