Performance analysis of SVD algorithm on the Trident processor

Author

Soliman, Mostafa I. ; Sedukhin, Stanislav G.

Author_Institution

Graduate Sch. of Comput. Sci. & Eng., Univ. of Aizu, Japan

fYear

2002

fDate

2002

Firstpage

95

Lastpage

102

Abstract

Within the current decade, process technology is promising one billion transistors on a single die, operating at frequency of from 6 to 10 GHz. As a direct result of the fundamental trends of increasing transistors density and switching speeds, newer technological and microarchitectural design constrains are introduced. Among them, wire delays will become critical. To take the benefits of the VLSI technology, we proposed Trident processor, which emphasizes on local communication. Like vector architectures, Trident processor extends a scalar core with parallel lanes; each lane contains an execution datapath and a slice of register file. However, Trident processor uses ring and communication registers, which are based on local communication, to store and cyclically shift 1-D data within and across the lanes, respectively. By using parallel datapaths, ring, and communication registers, Trident processor can effectively process not only vector but also matrix data. In this paper, the performance of the Trident processor on singular value decomposition (SVD) algorithm is evaluated. On 500×600 input matrix, four lanes Trident processor significantly reduces the number of instructions (44 times less), loop overhead (30 times less), and load/store operations (3 times less) comparing with a scalar code. Moreover, Trident processor is scalable and its scalability needs only replicating lanes to process longer vectors or larger matrices (eight lanes can speedup SVD by 2.5 times over four lanes).

Keywords

parallel architectures; performance evaluation; singular value decomposition; SVD algorithm; Trident processor; VLSI technology; communication registers; execution datapath; loop overhead; microarchitectural design constrains; parallel datapaths; performance Analysis; process technology; wire delays; Communication switching; Delay; Frequency; Matrix decomposition; Microarchitecture; Performance analysis; Registers; Singular value decomposition; Very large scale integration; Wire;

fLanguage

English

Publisher

ieee

Conference_Titel

Cyber Worlds, 2002. Proceedings. First International Symposium on

Print_ISBN

0-7695-1862-1

Type

conf

DOI

10.1109/CW.2002.1180865

Filename

1180865