Author :
Kyo, Shorin ; Okazaki, Shin´ichiro ; Arai, Tamio
Abstract :
Embedded processors for video image recognition in most cases not only need to address the conventional cost (die size and power) versus real-time performance issue, but must also maintain high flexibility due to the immense diversity of recognition targets, situations, and applications. This paper describes IMAP, a highly parallel SIMD linear processor and memory array architecture that addresses these trade-off requirements. By using parallel and systolic algorithmic techniques, but based on a simple linear array architecture, IMAP successfully exploits not only the straightforward per-image row data level parallelism (DLP), but also the inherent DLP of other memory access patterns frequently found in various image recognition tasks, while allowing programming to be done using an explicit parallel C language (1DC). We describe and evaluate IMAP-CE, one of the latest IMAP processors, integrating 128 100 MHz 8 bit 4-way VLIW PEs, 128 2 KByte RAMs, and one 16 bit RISC control processor onto a single chip. The PE instruction set is enhanced to support 1DC code. The die size of IMAP-CE is 11 times11 mm2 integrating 32.7 M transistors, while the power consumption is, on average, approximately 2 watts. IMAP-CE is evaluated mainly by comparing its performance while running 1DC code with that of a 2.4 GHz Intel P4 running optimized C code. Based on the use of parallelizing techniques, benchmark results show a speed increase of up to 20 times for image filter kernels and of 4 times for a full image recognition application
Keywords :
C language; embedded systems; image recognition; instruction sets; parallel algorithms; parallel memories; random-access storage; reduced instruction set computing; systolic arrays; IMAP processors; PE instruction set; RISC control processor; VLIW; embedded image recognition systems; embedded processors; image filter kernels; integrated memory array processor; linear array architecture; memory access patterns; memory array architecture; parallel C language; parallel SIMD linear processor; parallel algorithmic technique; per-image row data level parallelism; power consumption; systolic algorithmic techniques; transistors; Costs; Image recognition; Linear programming; Memory architecture; Parallel programming; Process control; Random access memory; Reduced instruction set computing; Target recognition; VLIW; Parallel SIMD processor; image processing; image recognition.; memory array processor; parallel language;