• DocumentCode
    1763693
  • Title

    A High-Utilization Scheduling Schemeof Stream Programs on ClusteredVLIW Stream Architectures

  • Author

    Guoyue Jiang ; Zhaolin Li ; Fang Wang ; Shaojun Wei

  • Author_Institution
    Inst. of Microelectron., Tsinghua Univ., Beijing, China
  • Volume
    25
  • Issue
    4
  • fYear
    2014
  • fDate
    41730
  • Firstpage
    840
  • Lastpage
    850
  • Abstract
    Stream architectures have emerged as a mainstream solution for computation-intensive applications due to their rich arithmetic units. This paper proposes a multithreading technique based on a scheduling scheme of stream programs on clustered VLIW stream architecture, which aims at optimal arithmetic unit utilization without increasing energy consumption. Its principle is to exploit more kernel-level parallelism for further optimal compilation by constructing homogeneous multiple threads on stream programs. Three phases are proposed in the scheduling scheme. First, threads in stream programs are replicated for constructing homogeneous multiple threads. Second, time step assignment for homogeneous multithreaded stream programs is utilized to obtain efficient kernel combination. Third, stream segmentation is presented to optimize both memory transfers and startup overheads of kernels. A set of benchmarks are exploited to evaluate the effectiveness of the proposed technique. Experimental results show that, with exploiting kernel-level software pipeline, the proposed technique improves the performance by 20.9 percent averagely with the energy decreasing by 7.6 percent. Utilizations of adders and multipliers are up to average 77.4 and 75.8 percent, increasing 17.0 and 13.3 percent, respectively. Moreover, the proposed technique performs an average of 12.5 percent improvement over CSMT4 with the energy decreasing by 12.0 percent.
  • Keywords
    adders; digital arithmetic; multi-threading; multiplying circuits; parallel architectures; pipeline processing; program compilers; scheduling; adders; arithmetic units; clustered VLIW stream architectures; computation-intensive applications; high-utilization scheduling scheme; homogeneous multithreaded stream programs; kernel combination; kernel-level parallelism; kernel-level software pipeline; memory transfer optimization; multipliers; multithreading technique; optimal arithmetic unit utilization; optimal compilation; performance improvement; startup overhead optimization; stream segmentation; time step assignment; Computer architecture; Instruction sets; Kernel; Registers; Streaming media; System-on-chip; VLIW; Stream architecture; arithmetic unit utilization; homogeneous multiple threads; scheduling scheme; stream program;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2013.80
  • Filename
    6482558