• DocumentCode
    2582956
  • Title

    Scale-out processors

  • Author

    Lotfi-Kamran, Pejman ; Grot, Boris ; Ferdman, Michael ; Volos, Stavros ; Kocberber, Onur ; Picorel, Javier ; Adileh, Almutaz ; Jevdjic, Djordje ; Idgunji, Sachin ; Ozer, Emre ; Falsafi, Babak

  • Author_Institution
    EcoCloud, EPFL, Lausanne, Switzerland
  • fYear
    2012
  • fDate
    9-13 June 2012
  • Firstpage
    500
  • Lastpage
    511
  • Abstract
    Scale-out datacenters mandate high per-server throughput to get the maximum benefit from the large TCO investment. Emerging applications (e.g., data serving and web search) that run in these datacenters operate on vast datasets that are not accommodated by on-die caches of existing server chips. Large caches reduce the die area available for cores and lower performance through long access latency when instructions are fetched. Performance on scale-out workloads is maximized through a modestly-sized last-level cache that captures the instruction footprint at the lowest possible access latency. In this work, we introduce a methodology for designing scalable and efficient scale-out server processors. Based on a metric of performance-density, we facilitate the design of optimal multi-core configurations, called pods. Each pod is a complete server that tightly couples a number of cores to a small last-level cache using a fast interconnect. Replicating the pod to fill the die area yields processors which have optimal performance density, leading to maximum per-chip throughput. Moreover, as each pod is a stand-alone server, scale-out processors avoid the expense of global (i.e., interpod) interconnect and coherence. These features synergistically maximize throughput, lower design complexity, and improve technology scalability. In 20nm technology, scaleout chips improve throughput by 5x-6.5x over conventional and by 1.6x-1.9x over emerging tiled organizations.
  • Keywords
    computer centres; performance evaluation; TCO investment; data serving; instruction footprint; maximum perchip; optimal multicore configurations; optimal performance density; scale out datacenters; scale out server processors; scale out workloads; scale-out processors; server chips; stand-alone server; web search; Coherence; Computer architecture; Delay; Organizations; Program processors; Servers; Throughput;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture (ISCA), 2012 39th Annual International Symposium on
  • Conference_Location
    Portland, OR
  • ISSN
    1063-6897
  • Print_ISBN
    978-1-4673-0475-7
  • Electronic_ISBN
    1063-6897
  • Type

    conf

  • DOI
    10.1109/ISCA.2012.6237043
  • Filename
    6237043