Abstract :
Within SoCs for embedded media applications, performance and computational efficiency can only grow through scalability, dedicated instructions, and specialized memory sub-systems. Scalability can only be achieved through short wires and low fan-in and fan-out. The presented multi-processor template makes use of these properties. It provides multiple threads of control (cell-level multi processing), each processor cell having a multitude of issue slots, localised register files, memories, and interconnects. The template allows the compiler to control all of these resources separately, eliminating hardware overhead for instruction decoding, pipeline control, hazard detection, and bypass networks. The template also integrates media-oriented and SIMD instructions, which the compiler can automatically select. This paper describes the template underlying several multimedia multi-processor designs
Keywords :
embedded systems; multimedia computing; multiprocessing systems; parallel processing; program compilers; system-on-chip; SIMD instruction; SoC; bypass network; compiler; dedicated instruction; embedded media application; hazard detection; instruction decoding; issue slot multitude; localised register file; media-oriented instruction; memory subsystem; multiprocessor template; pipeline control; register interconnects; register memory; scalability; system-on-chip; Automatic control; Computational efficiency; Decoding; Hardware; Pipelines; Process control; Registers; Scalability; Wires; Yarn;