Amplifying Embedded System Efficiency via Automatic Instruction Fusion on a Post-Manufacturing Reconfigurable Architecture Platform

Author

Cheng, Allen C.

Author_Institution

Univ. of Pittsburgh, Pittsburgh

fYear

2008

fDate

17-19 March 2008

Firstpage

744

Lastpage

749

Abstract

Portable embedded SoC processor architects are constantly challenged by exponentially increasing demand for newer functionality, faster real-time communication, stronger security, and higher reliability; while the constraint on energy, feature size, NRE cost, and time-to-market (TTM) grows tighter than ever. Existing approaches attempting to achieve these mutual conflicting design goals rely heavily on adopting special-purpose accelerators (SPA) to take charge of the heavy lifting in the aimed embedded SoC designs. These SPAs, synthesized from either ASIC or FPGA, are usually augmented to the base processor as co-processors to execute the performance-critical regions of applications. ASIC-based SPAs achieve performance-energy efficiency at the expense of sacrificing post-manufacturing programmability while incurring large NRE and TTM; FPGA-based SPAs retain programmability at the expense of significant energy and area increase. Furthermore, augmenting these SPAs as co-processors adds considerable communication and synchronization overhead severely compromising their initially promised benefits. This paper proposes an innovative design paradigm that moves away from the common scheme of adding co-processing ASIC/FPGA SPAs to an integrated and reconfigurable design. Specifically, we propose a new class of embedded processor by replacing the processor´s conventional ALU with a more powerful and flexible versatile processing unit (VPU). VPU enables multiple interdependent instructions to be fused and processed together as a single atomic VPU instruction by exploring dataflow dependencies of the application code. The instruction fusion is automatically performed by a VPU-aware compiler. The optimized VPU code reduces code size and amplifies the effective processor bandwidth and capacity by eliminating transient computation and register spill code. Experimental results show up to 400% and average 150% speedup for MediaBench with negligible area increase.

Keywords

coprocessors; embedded systems; field programmable gate arrays; integrated circuit design; logic design; program compilers; reconfigurable architectures; system-on-chip; ASIC based special-purpose accelerators; FPGA; VPU-aware compiler; automatic instruction fusion; effective processor bandwidth; embedded system efficiency; portable embedded SoC processor design; post-manufacturing reconfigurable architecture platform; versatile processing unit; Application specific integrated circuits; Bandwidth; Communication system security; Coprocessors; Cost function; Embedded system; Field programmable gate arrays; Reconfigurable architectures; Registers; Time to market;

fLanguage

English

Publisher

ieee

Conference_Titel

Quality Electronic Design, 2008. ISQED 2008. 9th International Symposium on

Conference_Location

San Jose, CA

Print_ISBN

978-0-7695-3117-5

Type

conf

DOI

10.1109/ISQED.2008.4479831

Filename

4479831

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3193831