• DocumentCode
    941121
  • Title

    An Efficient Data-Distribution Mechanism in a Processor-In-Memory (PIM) Architecture Applied to Motion Estimation

  • Author

    Kang, Jung-Yup ; Gupta, Sandeep ; Gaudiot, Jean-Luc

  • Author_Institution
    Mindspeed Technol., Inc., Newport Beach
  • Volume
    57
  • Issue
    3
  • fYear
    2008
  • fDate
    3/1/2008 12:00:00 AM
  • Firstpage
    375
  • Lastpage
    388
  • Abstract
    In general, the main purpose of using processor-in-memory (PIM) modules is to dramatically increase the data-level parallelism (DLP) and avoid the limited issue rate of current systems (even when they include SIMD extensions) caused by the limited data bandwidth and functional units. Our approach is to divide the PIM module into hundreds of smaller pieces so that each of these smaller PIMs can execute motion estimation for a group of macro blocks in a parallel fashion. We also design the logic in each PIM to execute in a highly pipelined fashion so that even more parallelism can be exploited. The main contribution of this paper is the presentation of architectural techniques that can be used in the PIM module to overcome the addressing and data sharing overhead when these smaller PIMs are used. Our architectural techniques have been applied to motion estimation. Indeed, it has been reported that motion estimation takes the majority of the execution time of MPEG encoding and it has been researched by many because of its importance in MPEG encoding. With our paradigm and techniques, the host processor can be relieved from the most computationally demanding and data-intensive portions of the workload, which should therefore yield a significant performance gain. Indeed, we observed (when 512 of these smaller PIMs were used) a reduction in the number of memory accesses by a factor of up to 2,034 times. At the same time, the performance improved by a multiplicative factor as high as 439 times.
  • Keywords
    image coding; memory architecture; motion estimation; MPEG encoding; SIMD extensions; data sharing overhead; data-level parallelism; efficient data-distribution mechanism; motion estimation; processor-in-memory architecture; Bandwidth; Computer architecture; Concurrent computing; Encoding; Hardware; Logic design; MPEG 4 Standard; Motion estimation; Parallel processing; Performance gain; Real-time and embedded systems; Special-Purpose and Application-Based Systems;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2007.70818
  • Filename
    4358277