DocumentCode :
3351706
Title :
16-bit FP sub-word parallelism to facilitate compiler vectorization and improve performance of image and media processing
Author :
Etiemble, Daniel ; Lacassagne, Lionel
Author_Institution :
LRI, France
fYear :
2004
fDate :
15-18 Aug. 2004
Firstpage :
540
Abstract :
We consider the implementation of 16-bit floating point instructions on a Pentium 4 and a PowerPC G5 for image and media processing. By measuring the execution time of benchmarks with these new simulated instructions, we show that significant speed-up is obtained compared to 32-bit FP versions. For image processing, the speed-up both comes from doubling the number of operations per SIMD instruction and the better cache behavior with byte storage. For data stream processing with arrays of structures, the speed-up mainly comes from the wider SIMD instructions.
Keywords :
floating point arithmetic; image processing; instruction sets; multimedia systems; parallel processing; program compilers; 16-bit floating point instruction; 32-bit FP versions; Pentium 4 instruction; PowerPC G5 instruction; SIMD instruction; compiler vectorization; data stream process; image processing; media processing; Computer aided instruction; Digital signal processing; Dynamic range; Graphics; Image processing; Image storage; Microprocessors; Parallel processing; Pixel; Streaming media;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing, 2004. ICPP 2004. International Conference on
ISSN :
0190-3918
Print_ISBN :
0-7695-2197-5
Type :
conf
DOI :
10.1109/ICPP.2004.1327964
Filename :
1327964
Link To Document :
بازگشت