• DocumentCode
    1692859
  • Title

    Variable block size motion estimation implementation on compute unified device architecture (CUDA)

  • Author

    Dong-Kyu Lee ; Seoung-Jun Oh

  • fYear
    2013
  • Firstpage
    633
  • Lastpage
    634
  • Abstract
    This paper proposes a highly parallel variable block size full search motion estimation algorithm with concurrent parallel reduction (CPR) on graphics processing unit (GPU) using compute unified device architecture (CUDA). This approach minimizes memory access latency by using high-speed on-chip memory of GPU. By applying parallel reductions concurrently depending on the amount of data and the data dependency, the proposed approach increases thread utilization and decreases the number of synchronization points which cause latency. Experimental results show that the proposed approach achieves substantial improvement up to 92 times than the central processing unit (CPU) only counterpart.
  • Keywords
    data compression; graphics processing units; motion estimation; search problems; video coding; CPR; CPU; CUDA; GPU; H.264-AVC standard; central processing unit; compute unified device architecture; concurrent parallel reduction; graphics processing unit; high-speed on-chip memory; highly parallel variable block size full search motion estimation algorithm; memory access latency minimization; parallel reductions; synchronization points; variable block size motion estimation implementation; Computer architecture; Graphics processing units; High definition video; Instruction sets; Motion estimation; Synchronization; Video coding;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Consumer Electronics (ICCE), 2013 IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • ISSN
    2158-3994
  • Print_ISBN
    978-1-4673-1361-2
  • Type

    conf

  • DOI
    10.1109/ICCE.2013.6487048
  • Filename
    6487048