DocumentCode :
24672
Title :
On-Chip Memory Hierarchy in One Coarse-Grained Reconfigurable Architecture to Compress Memory Space and to Reduce Reconfiguration Time and Data-Reference Time
Author :
Yansheng Wang ; Leibo Liu ; Shouyi Yin ; Min Zhu ; Peng Cao ; Jun Yang ; Shaojun Wei
Author_Institution :
Nat. Lab. for Inf. Sci. & Technol., Tsinghua Univ., Beijing, China
Volume :
22
Issue :
5
fYear :
2014
fDate :
May-14
Firstpage :
983
Lastpage :
994
Abstract :
The coarse-grained reconfigurable architecture (CGRA) is proven to be energy efficient in several specific domains. In CGRAs, the on-chip memory hierarchy, which contains the context memory and the data memory organizations, should be well considered to achieve appropriate tradeoffs among three aspects: 1) performance; 2) area; and 3) power. In this paper, two techniques called the hierarchical configuration context (HCC) and the lifetime-based data-memory organization (LDO) focusing on the context memory and the data memory organizations are proposed to compress the on-chip memory space and to reduce the reconfiguration time and the data-reference time. In the HCC, the contexts are constructed in a hierarchical fashion to completely eliminate the repetitive portions of the contexts, not only reducing the overall context storage, but also alleviating the context transportation overhead. A fast context-indexing mechanism in the HCC is proposed to achieve fast reconfiguration, as the hierarchically organized contexts can be located and accessed conveniently. In the LDO, the on-chip data are classified into two types, based on the lifetime of data. The short-lifetime data are stored in the first in first out to increase the reuse ratio of memory space automatically, whereas the long-lifetime data are stored in the radom access memory for several time references. The HCC and the LDO are used in a CGRA core called as reconfigurable processing unit (RPU). Two RPUs are integrated in a reconfigurable computing processor (RCP) called as REconfigurable MUlti-media System, High-Performance Processor (REMUS_HPP). Because of the HCC, compared with a traditional nonhierarchical system, the total context storage required in H.264 decoding is reduced by 77%. Because of the LDO, the normalized on-chip data memory size at same performance level in the REMUS_HPP is only 23.8% and 14.8% of those in XPP-III (a high-performance RCP) and ADRES (a low-power RCP). REMUS_HPP is implemented - n a 48.9-mm2 silicon with TSMC 65-nm technology, using a 200-MHz working frequency to achieve 1920 × 1088 at 30 fps H.264 high-profile decoding. Compared with XPP-III, the performance of the REMUS_HPP is 1.81× boosted, whereas the energy efficiency is 4.75× higher.
Keywords :
decoding; elemental semiconductors; integrated memory circuits; low-power electronics; random-access storage; silicon; Si; TSMC technology; coarse-grained reconfigurable architecture; data-reference time; fast context-indexing mechanism; frequency 200 MHz; hierarchical configuration context; high-performance processor; lifetime-based data-memory organization; memory space; on-chip memory hierarchy; radom access memory; reconfigurable multimedia system; reconfigurable processing unit; reconfiguration time; silicon; size 65 nm; Coarse-grained reconfigurable architecture (CGRA); context memory; data memory; reconfigurable multimedia system high-performance processor (REMUS HPP); reconfigurable multimedia system high-performance processor (REMUS_HPP); video decoder; video decoder.;
fLanguage :
English
Journal_Title :
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-8210
Type :
jour
DOI :
10.1109/TVLSI.2013.2263155
Filename :
6553240
Link To Document :
بازگشت