• DocumentCode
    1468789
  • Title

    From Xetal-II to Xetal-Pro: On the Road Toward an Ultralow-Energy and High-Throughput SIMD Processor

  • Author

    Yu Pu ; Yifan He ; Zhenyu Ye ; Londono, S.M. ; Abbo, A.A. ; Kleihorst, R. ; Corporaal, H.

  • Author_Institution
    Sakurai Lab., Univ. of Tokyo, Tokyo, Japan
  • Volume
    21
  • Issue
    4
  • fYear
    2011
  • fDate
    4/1/2011 12:00:00 AM
  • Firstpage
    472
  • Lastpage
    484
  • Abstract
    Looking forward to the next generation of mobile streaming computing, the demanded energy efficiency of end-user terminals will become ever stringent. The Xetal-Pro processor, which is the latest member of the Xetal low-power single-instruction multiple data (SIMD) processor family from Philips, is presented in this paper. The predecessor of Xetal-Pro, known as Xetal-II, already ranks as one of the most computational-efficient [in terms of giga operations per second (GOPS)/Watt] processors available today, however, it cannot yet achieve the demanded energy efficiency (less than 1 pJ per operation). Unlike Xetal-II, Xetal-Pro supports ultrawide supply voltage (Vdd) scaling from the nominal supply to the subthreshold region. Although aggressive Vdd scaling causes severe throughput degradation, this can be partly compensated for by the massive parallelism in the Xetal family. Xetal-II includes a large on-chip frame memory (FM), which cannot be scaled well to an ultralow Vdd hence creating a big obstacle to increase energy efficiency. Therefore, we investigate both different FM realizations and memory organization alternatives. A hybrid memory system (HMS), which reduces the non-local memory traffic and enables further Vdd scaling, is proposed. For design space exploration of the right number of the scratchpad memory (SM) entries, the corresponding data locality analysis is provided, too. Moreover, some unique circuit implementation issues of Xetal-Pro such as the customized level-shifter are also discussed. Compared to Xetal-II operating at the nominal voltage, Xetal-Pro provides up to two times energy efficiency improvement even without Vdd scaling (essentially a consequence of data localization in the SM) when delivering the same amount of ultrahigh throughput. With Vdd scaling into the sub/near threshold region, Xetal-Pro could gain more than ten times energy reduction while still delivering a high throughput of 0.69 GOPS (- ounting multiply and add operations only). The new insight of Xetal-Pro sheds light on the direction of future ultralow-energy SIMD processors.
  • Keywords
    data analysis; microprocessor chips; mobile computing; multiprocessing systems; parallel processing; SIMD processor; Xetal-II processor; Xetal-Pro processor; customized level-shifter; data locality analysis; design space exploration; end-user terminals; hybrid memory system; mobile streaming computing; nonlocal memory traffic; on-chip frame memory; scratchpad memory; single-instruction multiple data processor; ultrawide supply voltage scaling; Computer architecture; Frequency modulation; Kernel; Random access memory; System-on-a-chip; Throughput; Hybrid memory system; SIMD; Xetal; sub/near threshold; ultralow-energy;
  • fLanguage
    English
  • Journal_Title
    Circuits and Systems for Video Technology, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1051-8215
  • Type

    jour

  • DOI
    10.1109/TCSVT.2011.2125590
  • Filename
    5728854