• DocumentCode
    3473169
  • Title

    Effective usage of vector registers in decoupled vector architectures

  • Author

    Villa, Luis ; Espasa, Roger ; Valero, Mako

  • Author_Institution
    Dept. d´´Arquitectura de Comput., Univ. Politecnica de Catalunya, Barcelona, Spain
  • fYear
    1998
  • fDate
    21-23 Jan 1998
  • Firstpage
    495
  • Lastpage
    501
  • Abstract
    The paper presents a study of the impact of reducing the vector register size in a decoupled vector architecture. In traditional in-order vector architectures long vector registers have typically been the norm. The authors present data which shows that, even for highly vectorizable codes, only a small fraction of all elements of a long vector register are actually used. They also show that reducing the register size in a traditional vector architecture in an attempt to reduce hardware cost and maximize register utilization results in a severe performance degradation. However they combine the decoupling technique with the vector register reduction and show that the resulting architecture tolerates very well the register size cuts. They simulate a selection of Perfect Club and Specfp92 programs using a trace driven approach and compare the execution time in a conventional vector architecture with a decoupled vector architecture using different registers sizes. Halving the register size and using decoupling provides speedups between 1.04-1.49 over a traditional in-order vector machines. Even reducing the register length to 1/4 the original size (and in some cases, to 1/8) the performance of the decoupled machine is better than a conventional vector model. Moreover they observe that the resulting decoupled machine with short registers tolerates very well long memory latencies
  • Keywords
    computational complexity; performance evaluation; vector processor systems; virtual machines; Perfect Club programs; Specfp92 programs; decoupled vector architectures; decoupling technique; execution time; hardware cost reduction; in-order vector architectures; long memory latencies; maximized register utilization; simulation; trace driven approach; vector register size reduction; vector registers; Computer architecture; Costs; Degradation; Delay; Engines; Hardware; Out of order; Registers; Space technology; Vector processors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing, 1998. PDP '98. Proceedings of the Sixth Euromicro Workshop on
  • Conference_Location
    Madrid
  • Print_ISBN
    0-8186-8332-5
  • Type

    conf

  • DOI
    10.1109/EMPDP.1998.647238
  • Filename
    647238