• DocumentCode
    3193831
  • Title

    Amplifying Embedded System Efficiency via Automatic Instruction Fusion on a Post-Manufacturing Reconfigurable Architecture Platform

  • Author

    Cheng, Allen C.

  • Author_Institution
    Univ. of Pittsburgh, Pittsburgh
  • fYear
    2008
  • fDate
    17-19 March 2008
  • Firstpage
    744
  • Lastpage
    749
  • Abstract
    Portable embedded SoC processor architects are constantly challenged by exponentially increasing demand for newer functionality, faster real-time communication, stronger security, and higher reliability; while the constraint on energy, feature size, NRE cost, and time-to-market (TTM) grows tighter than ever. Existing approaches attempting to achieve these mutual conflicting design goals rely heavily on adopting special-purpose accelerators (SPA) to take charge of the heavy lifting in the aimed embedded SoC designs. These SPAs, synthesized from either ASIC or FPGA, are usually augmented to the base processor as co-processors to execute the performance-critical regions of applications. ASIC-based SPAs achieve performance-energy efficiency at the expense of sacrificing post-manufacturing programmability while incurring large NRE and TTM; FPGA-based SPAs retain programmability at the expense of significant energy and area increase. Furthermore, augmenting these SPAs as co-processors adds considerable communication and synchronization overhead severely compromising their initially promised benefits. This paper proposes an innovative design paradigm that moves away from the common scheme of adding co-processing ASIC/FPGA SPAs to an integrated and reconfigurable design. Specifically, we propose a new class of embedded processor by replacing the processor´s conventional ALU with a more powerful and flexible versatile processing unit (VPU). VPU enables multiple interdependent instructions to be fused and processed together as a single atomic VPU instruction by exploring dataflow dependencies of the application code. The instruction fusion is automatically performed by a VPU-aware compiler. The optimized VPU code reduces code size and amplifies the effective processor bandwidth and capacity by eliminating transient computation and register spill code. Experimental results show up to 400% and average 150% speedup for MediaBench with negligible area increase.
  • Keywords
    coprocessors; embedded systems; field programmable gate arrays; integrated circuit design; logic design; program compilers; reconfigurable architectures; system-on-chip; ASIC based special-purpose accelerators; FPGA; VPU-aware compiler; automatic instruction fusion; effective processor bandwidth; embedded system efficiency; portable embedded SoC processor design; post-manufacturing reconfigurable architecture platform; versatile processing unit; Application specific integrated circuits; Bandwidth; Communication system security; Coprocessors; Cost function; Embedded system; Field programmable gate arrays; Reconfigurable architectures; Registers; Time to market;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Quality Electronic Design, 2008. ISQED 2008. 9th International Symposium on
  • Conference_Location
    San Jose, CA
  • Print_ISBN
    978-0-7695-3117-5
  • Type

    conf

  • DOI
    10.1109/ISQED.2008.4479831
  • Filename
    4479831