• DocumentCode
    769385
  • Title

    Bottlenecks in multimedia processing with SIMD style extensions and architectural enhancements

  • Author

    Talla, Deepu ; John, Lizy Kurian ; Burger, Doug

  • Author_Institution
    Texas Instrum. Inc., Dallas, TX, USA
  • Volume
    52
  • Issue
    8
  • fYear
    2003
  • Firstpage
    1015
  • Lastpage
    1031
  • Abstract
    Multimedia SIMD extensions such as MMX and AltiVec speed up media processing; however, our characterization shows that the attributes of current general-purpose processors enhanced with SIMD extensions do not match very well with the access patterns and loop structures of media programs. We find that 75 to 85 percent of the dynamic instructions in the processor instruction stream are supporting instructions necessary to feed the SIMD execution units rather than true/useful computations, resulting in the underutilization of SIMD execution units (only 1 to 12 percent of the peak SIMD execution units´ throughput is achieved). Contrary to focusing on exploiting more data-level parallelism (DLP), we focus on the instructions that support the SIMD computations and exploit both fine and coarse-grained instruction level parallelism (ILP) in the supporting instruction stream. We propose the MediaBreeze architecture that uses hardware support for efficient address generation, looping, and data reorganization (permute, packing/unpacking, transpose, etc.). Our results on multimedia kernels show that a 2-way processor with SIMD extensions enhanced with MediaBreeze provides a better performance than a 16-way processor with current SIMD extensions. In the case of application benchmarks, a 2-/4-way processor with SIMD extensions augmented with MediaBreeze outperforms a 4-/8-way processor with SIMD extensions. A first-order approximation using ASIC synthesis tools and cell-based libraries shows that this acceleration is achieved at a 10 percent increase in area required by MMX and SSE extensions (0.3 percent increase in overall chip area) and 1 percent of total processor power consumption.
  • Keywords
    multimedia systems; parallel architectures; parallel programming; performance evaluation; program control structures; ASIC synthesis tool; AltiVec; MMX; MediaBreeze architecture; SIMD; address generation; cell-based libraries; data reorganization; data-level parallelism; general-purpose processors; instruction level parallelism; multimedia; performance evaluation; processor instruction stream; program loop structures; throughput; total processor power consumption; Computer aided instruction; Computer architecture; Concurrent computing; Feeds; Hardware; Kernel; Parallel processing; Pattern matching; Streaming media; Throughput;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2003.1223637
  • Filename
    1223637