• DocumentCode
    3196580
  • Title

    Performance analysis of SVD algorithm on the Trident processor

  • Author

    Soliman, Mostafa I. ; Sedukhin, Stanislav G.

  • Author_Institution
    Graduate Sch. of Comput. Sci. & Eng., Univ. of Aizu, Japan
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    95
  • Lastpage
    102
  • Abstract
    Within the current decade, process technology is promising one billion transistors on a single die, operating at frequency of from 6 to 10 GHz. As a direct result of the fundamental trends of increasing transistors density and switching speeds, newer technological and microarchitectural design constrains are introduced. Among them, wire delays will become critical. To take the benefits of the VLSI technology, we proposed Trident processor, which emphasizes on local communication. Like vector architectures, Trident processor extends a scalar core with parallel lanes; each lane contains an execution datapath and a slice of register file. However, Trident processor uses ring and communication registers, which are based on local communication, to store and cyclically shift 1-D data within and across the lanes, respectively. By using parallel datapaths, ring, and communication registers, Trident processor can effectively process not only vector but also matrix data. In this paper, the performance of the Trident processor on singular value decomposition (SVD) algorithm is evaluated. On 500×600 input matrix, four lanes Trident processor significantly reduces the number of instructions (44 times less), loop overhead (30 times less), and load/store operations (3 times less) comparing with a scalar code. Moreover, Trident processor is scalable and its scalability needs only replicating lanes to process longer vectors or larger matrices (eight lanes can speedup SVD by 2.5 times over four lanes).
  • Keywords
    parallel architectures; performance evaluation; singular value decomposition; SVD algorithm; Trident processor; VLSI technology; communication registers; execution datapath; loop overhead; microarchitectural design constrains; parallel datapaths; performance Analysis; process technology; wire delays; Communication switching; Delay; Frequency; Matrix decomposition; Microarchitecture; Performance analysis; Registers; Singular value decomposition; Very large scale integration; Wire;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cyber Worlds, 2002. Proceedings. First International Symposium on
  • Print_ISBN
    0-7695-1862-1
  • Type

    conf

  • DOI
    10.1109/CW.2002.1180865
  • Filename
    1180865