Title :
Architecture Support for Improving Bulk Memory Copying and Initialization Performance
Author :
Jiang, Xiaowei ; Solihin, Yan ; Zhao, Li ; Iyer, Ravishankar
Author_Institution :
Dept. of Electr. & Comput. Eng., North Carolina State Univ., Raleigh, NC, USA
Abstract :
Bulk memory copying and initialization is one of the most ubiquitous operations performed in current computer systems by both user applications and Operating Systems. While many current systems rely on a loop of loads and stores, there are proposals to introduce a single instruction to perform bulk memory copying. While such an instruction can improve performance due to generating fewer TLB and cache accesses, and requiring fewer pipeline resources, in this paper we show that the key to significantly improving the performance is removing pipeline and cache bottlenecks of the code that follows the instructions. We show that the bottlenecks arise due to (1) the pipeline clogged by the copying instruction, (2) lengthened critical path due to dependent instructions stalling while waiting for the copying to complete, and (3) the inability to specify (separately) the cacheability of the source and destination regions. We propose FastBCI, an architecture support that achieves the granularity efficiency of a bulk copying/ initialization instruction, but without its pipeline and cache bottlenecks. When applied to OS kernel buffer management, we show that on average FastBCI achieves anywhere between 23% to 32% speedup ratios, which is roughly 3x-4x of an alternative scheme, and 1.5x-2x of a highly optimistic DMA with zero setup and interrupt overheads.
Keywords :
operating system kernels; storage management; architecture support; bulk memory copying; cache access; cache bottleneck; cacheability; computer system; initialization performance; operating system kernel buffer management; ubiquitous operation; Application software; Computer architecture; Concurrent computing; Kernel; Memory management; Operating systems; Parallel architectures; Pervasive computing; Pipelines; TCPIP; cache affinity; cache neutral; early retirement; memory copying; memory initialization;
Conference_Titel :
Parallel Architectures and Compilation Techniques, 2009. PACT '09. 18th International Conference on
Conference_Location :
Raleigh, NC
Print_ISBN :
978-0-7695-3771-9
DOI :
10.1109/PACT.2009.31