DocumentCode :
3537779
Title :
Software and Hardware Co-designed Multi-level TLBs for Chip Multiprocessors
Author :
Zhang, Xiaohui ; Cong, Ming ; Chen, Guangqiang
Author_Institution :
Inst. of Comput. Technol., Key Lab. of Comput. Syst. & Archit., Chinese Acad. of Sci., Beijing, China
fYear :
2011
fDate :
Aug. 31 2011-Sept. 2 2011
Firstpage :
609
Lastpage :
614
Abstract :
Translation Look aside Buffers (TLBs) have a significant impact on system performance. Numerous prior studies focus on TLBs design for uniprocessors. As the advent of chip multiprocessors (CMPs), we need shift to TLBs for chip multiprocessors. This paper presents a software-implemented level-two TLB -- SL2-TLB which is a shared level-two TLB for multiprocessors. It not only reduces the cost of TLB refill handler for every processor core, but also reduces the redundant TLB misses´ cost for CMPs effectively. Today, CMPs typically employ private per-core TLBs. SL2-TLB together with the hardware TLBs make up a software-hardware co-designed multilevel TLB system which brings great benefit to system performance while avoiding changing the hardware TLB. So it is a convenient and efficient method for CMPs´ TLB performance improvement. The benefit brought by SL2-TLB to SPECCPU2000 is less than that to SPECCPU2006, about 5% and 7% separately. Therein to, the average performance improvement of SPECint 2006 reaches about 12.7%. That is because the overhead for TLB refill is small when the cache is large enough to avoid a miss as walking the page table of applications with small memory footprints. The further optimization for SL2-TLB is kept the SL2-TLB table stay in L2 cache forever by the cache locking scheme. SL2-TLB together with cache locking scheme improves the performances by over 13% for SPECint 2006. And an average performance improvement of over 7% is brought to the new emerging parallel benchmark suite-Princeton Application Repository for Shared-Memory Computers (PARSEC). And all the above evaluations are done on Godson-3 processors which is the latest generation of China´s most powerful microprocessor family.
Keywords :
cache storage; circuit optimisation; hardware-software codesign; microprocessor chips; paged storage; performance evaluation; shared memory systems; CMP; Godson-3 processor; L2 cache; PARSEC; Princeton application repository for shared-memory computers; SL2-TLB optimization; SPECint 2006 performance improvement; TLB refill handler; cache locking scheme; chip multiprocessor; memory footprints; multilevel TLB design; page table; parallel benchmark suite; processor core; software-hardware codesign; software-implemented level-two TLB; translation lookaside buffer; Benchmark testing; Computer architecture; Computers; Hardware; Kernel; Prefetching; System performance; Chip Multiprocessors (CMPs); Godson-3 processors; Software-implemented shared level-two TLB; parallel benchmarks; redundant TLB misses;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Technology (CIT), 2011 IEEE 11th International Conference on
Conference_Location :
Pafos
Print_ISBN :
978-1-4577-0383-6
Electronic_ISBN :
978-0-7695-4388-8
Type :
conf
DOI :
10.1109/CIT.2011.17
Filename :
6036833
Link To Document :
بازگشت