• DocumentCode
    3129688
  • Title

    Wire delay is not a problem for SMT (in the near future)

  • Author

    Vijaykumar, T.N. ; Chishti, Zeshan

  • Author_Institution
    Sch. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA
  • fYear
    2004
  • fDate
    19-23 June 2004
  • Firstpage
    40
  • Lastpage
    51
  • Abstract
    Previous papers have shown that the slow scaling of wire delays compared to logic delays will prevent superscalar performance from scaling with technology. In this paper, we show that the optimal pipeline for superscalar becomes shallower with technology, when wire delays are considered, tightening previous results that deeper pipelines perform only as well as shallower pipelines. The key reason for the lack of performance scaling is that superscalar does not have sufficient parallelism to hide the relatively-increased wire delays. However, Simultaneous Multithreading (SMT) provides the much-needed parallelism. We show that an SMT running a multiprogrammed workload with just 4-way issue not only retains the optimal pipeline depth over technology generations, enabling at least 43% increase in clock speed every generation, but also achieves the remainder of the expected speedup of two per generation through IPC. As wire delays become more dominant in future technologies, the number of programs needs to be scaled modestly to maintain the scaling trends, at least till the near-future 50nm technology. While this result ignores bandwidth constraints, using SMT to tolerate latency due to wire delays is not that simple because SMT causes bandwidth problems. Most of the stages of a modern out-of-order-issue pipeline employ RAM and CAM structures. Wire delays in conventional, latency-optimized RAM/CAM structures prevent them from being pipelined in a scaled manner. We show that this limitation prevents scaling of SMT throughput. We use bitline scaling to allow RAM/CAM bandwidth to scale with technology. Bitline scaling enables SMT throughput to scale at the rate of two per technology generation in the near future.
  • Keywords
    multi-threading; multiprocessing systems; parallel processing; pipeline processing; random-access storage; CAM structure; RAM structure; RAM/CAM bandwidth; bandwidth constraints; bitline scaling; clock speed; latency tolerance; latency-optimized RAM/CAM structures; logic delays; multiprogrammed workload; optimal pipeline depth; parallel processing; performance scaling; simultaneous multithreading; superscalar performance; wire delay scaling; Bandwidth; CADCAM; Computer aided manufacturing; Delay; Logic; Paper technology; Pipelines; Surface-mount technology; Throughput; Wire;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture, 2004. Proceedings. 31st Annual International Symposium on
  • ISSN
    1063-6897
  • Print_ISBN
    0-7695-2143-6
  • Type

    conf

  • DOI
    10.1109/ISCA.2004.1310762
  • Filename
    1310762