• DocumentCode
    3590655
  • Title

    Optimizing memory bandwidth in OpenVX graph execution on embedded many-core accelerators

  • Author

    Tagliavini, Giuseppe ; Haugou, Germain ; Benini, Luca

  • Author_Institution
    Univ. of Bologna, Bologna, Italy
  • fYear
    2014
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Computer vision and computational photography are hot applications areas for mobile and embedded computing platforms. As a consequence, many-core accelerators are being developed to efficiently execute highly-parallel image processing kernels. However, power and cost constraints impose hard limits on the main memory bandwidth available, and push for software optimizations which minimize the usage of large frame buffers to store the intermediate results of multi-kernel applications. In this work we propose a set of techniques, mainly based on graph analysis and image tiling, targeted to accelerate the execution on cluster-based many-core accelerators of image processing applications expressed as standard OpenVX graphs. We have developed a run-time framework which implements these techniques using a front-end compliant to the OpenVX standard, and based on an OpenCL extension that enables more explicit control and efficient reuse of on-chip memory and greatly reduces the recourse to off-chip memory for storing intermediate results. Experiments performed on the STHORM many-core accelerator prototype demonstrate that our approach leads to massive reductions of main memory related stall time even when the main memory bandwidth available to the accelerator is severely constrained.
  • Keywords
    computer vision; embedded systems; parallel processing; storage management; OpenCL extension; OpenVX graph execution; STHORM many-core accelerator; cluster-based many-core accelerator; computational photography; computer vision; embedded computing platform; embedded many-core accelerator; graph analysis; image processing application; image tiling; memory bandwidth optimization; mobile computing; mobile computing platform; multikernel application; off-chip memory; on-chip memory; parallel image processing kernel; software optimization; standard OpenVX graph; Acceleration; Bandwidth; Computer architecture; Image processing; Kernel; Optimization; Standards;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Design and Architectures for Signal and Image Processing (DASIP), 2014 Conference on
  • Type

    conf

  • DOI
    10.1109/DASIP.2014.7115617
  • Filename
    7115617