• DocumentCode
    3148533
  • Title

    Codevelopment of Multi-level ISA and hardware for an efficient matrix processor

  • Author

    Soliman, Mostafa I. ; Al-Junaid, Abdulmajid F.

  • Author_Institution
    Electr. Eng. Dept., South Valley Univ., Aswan, Egypt
  • fYear
    2009
  • fDate
    14-16 Dec. 2009
  • Firstpage
    211
  • Lastpage
    217
  • Abstract
    The instruction set architecture (ISA) is the part of the processor that is visible to the programmer or compiler writer. Multi-level ISA is proposed to explicitly communicate data parallelism to hardware (processor) in a compact way instead of the dynamic extraction using complex hardware or the static extraction using sophisticated compiler techniques. This paper presents the codevelopment of multi-level ISA and hardware for an efficient matrix processor called Mat-Core. Mat-Core extends a general-purpose scalar processor with a matrix unit for processing vector/matrix data. To hide memory latency, the extended matrix unit is decoupled into two components: address generation and data computation, which communicate through data queues. Like vector architectures, the data computation unit is organized in parallel lanes. However, on parallel lanes, Mat-Core can execute scalar-matrix, vector-matrix, and matrix-matrix instructions in addition to scalar-vector and vector-vector instructions. Mat-Core leads to a compiler model that is efficient both in terms of performance and executable code size. On four parallel lanes Mat-Core, our results show performances of about 1.6, 2.1, 4.1, and 6.4 FLOPs per clock cycle; achieved on scalar-vector multiplication, SAXPY, vector-matrix multiplication, and matrix-matrix multiplication, respectively.
  • Keywords
    digital arithmetic; instruction sets; matrix multiplication; parallel architectures; Mat-Core; address generation; compiler model; data computation; data computation unit; data parallelism; data queues; general-purpose scalar processor; matrix data; matrix processor; matrix-matrix instructions; matrix-matrix multiplication; multilevel instruction set architecture; scalar-matrix instructions; scalar-vector instructions; scalar-vector multiplication; vector-matrix instructions; vector-matrix multiplication; vector-vector instructions; Clocks; Computer architecture; Concurrent computing; Data mining; Delay; Hardware; Instruction sets; Parallel processing; Program processors; Programming profession; SystemC implementation; high performance computing; multi-level ISA; performance evaluation; vector/matrix processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Engineering & Systems, 2009. ICCES 2009. International Conference on
  • Conference_Location
    Cairo
  • Print_ISBN
    978-1-4244-5842-4
  • Electronic_ISBN
    978-1-4244-5843-1
  • Type

    conf

  • DOI
    10.1109/ICCES.2009.5383281
  • Filename
    5383281