Impact of chip-level integration on performance of OLTP workloads

Author

Barroso, Luiz André ; Gharachorloo, Kourosh ; Nowatzyk, Andreas ; Verghese, Ben

Author_Institution

Western Res. Lab., Compaq Comput. Corp., Houston, TX, USA

fYear

2000

fDate

6/22/1905 12:00:00 AM

Firstpage

Lastpage

Abstract

With increasing chip densities, future microprocessor designs have the opportunity to integrate many of the traditional system-level modules onto the same chip as the processor. Some current designs already integrate extremely large on-chip caches, and there are aggressive next-generation designs that attempt to also integrate the memory controller, coherence hardware, and network router all onto a single chip. The tight coupling of these modules will enable efficient memory systems with substantially better latency and bandwidth characteristics relative to current designs. Among the important application areas for high-performance servers, online transaction processing (OLTP) workloads are likely to benefit most from these trends due to their large instruction and data footprints and high communication miss rates. This paper examines the design trade-offs that arise as more system functionality is integrated onto the processor chip, and identifies a number of important architectural choices that are influenced by chip-level integration. In addition, the paper presents a detailed study of the performance impact of chip-level integration in the context of OLTP workloads. Our results are based on full system simulations of the Oracle commercial database engine running on both in-order and out-of-order issue processors used in uniprocessor and multiprocessor configurations. The results show that chip-level integration can improve the performance of both configurations by about 1.4 to 1.5 times, though for different reasons. For uniprocessors, integration of the L2 cache and the resulting lower hit latency is the primary factor in performance improvement. For multiprocessors, the improvement comes from both the integration of the L2 cache (lower L2 hit latency) and the integration of the other memory system components (better dirty remote latency). Furthermore, we find that the higher associativity afforded by integrating the L2 cache plays a critical role in counteracting the loss of capacity relative to larger off-chip caches. Finally, we find that the relative gains from chip-level integration are virtually identical for in-order and out-of-order processors

Keywords

data mining; multiprocessing systems; performance evaluation; OLTP workloads; Oracle commercial database engine; architectural choices; chip-level integration; coherence hardware; high-performance servers; memory controller; microprocessor designs; multiprocessor configurations; network router; online transaction processing; system functionality; system simulations; system-level modules; uniprocessor; Bandwidth; Databases; Delay; Hardware; Microprocessors; Network servers; Network-on-a-chip; Next generation networking; Out of order; Process design;

fLanguage

English

Publisher

ieee

Conference_Titel

High-Performance Computer Architecture, 2000. HPCA-6. Proceedings. Sixth International Symposium on

Conference_Location

Touluse

Print_ISBN

0-7695-0550-3

Type

conf

DOI

10.1109/HPCA.2000.824334

Filename

824334

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=1647174