Title :
Addressing queuing bottlenecks at high speeds
Author :
Kumar, Sailesh ; Turner, Jonathan ; Crowley, Patrick
Author_Institution :
Dept. of Comput. Sci. & Eng., Washington Univ., St. Louis, MO, USA
Abstract :
Modern routers and switch fabrics can have hundreds of input and output ports running at up to 10 Gb/s; 40 Gb/s systems are starting to appear. At these rates, the performance of the buffering and queuing subsystem becomes a significant bottleneck. In high performance routers with more than a few queues, packet buffering is typically implemented using DRAM for data storage and a combination of off-chip and on-chip SRAM for storing the linked-list nodes and packet length, and the queue headers, respectively. This paper focuses on the performance bottlenecks associated with the use of off-chip SRAM. We show how the combination of implicit buffer pointers and multi-buffer list nodes can dramatically reduce the impact of buffering and queuing subsystem on queuing performance. We also show how combining it with coarse-grained scheduling can improve the performance of fair queuing algorithms, while also reducing the amount of off-chip memory and bandwidth needed. These techniques can reduce the amount of SRAM needed to hold the list nodes by a factor of 10 at the cost of about 10% wastage of the DRAM space, assuming an aggregation degree of 16.
Keywords :
DRAM chips; SRAM chips; buffer storage; packet switching; queueing theory; scheduling; DRAM; coarse-grained scheduling; data storage; fair queuing algorithm; linked-list node; off-chip SRAM; on-chip SRAM; packet buffering; queue header; queuing subsystem; switch fabric; Bandwidth; Buffer storage; Clocks; Costs; Global Positioning System; Internet; Processor scheduling; Random access memory; Round robin; Sorting;
Conference_Titel :
High Performance Interconnects, 2005. Proceedings. 13th Symposium on
Print_ISBN :
0-7695-2449-4
DOI :
10.1109/CONECT.2005.7