Title :
SIMD Architectural Enhancements to Improve the Performance of the 2D Discrete Wavelet Transform
Author :
Shahbahrami, Asadollah ; Juurlink, Ben
Author_Institution :
Comput. Eng. Lab., Delft Univ. of Technol., Delft, Netherlands
Abstract :
The 2D Discrete Wavelet Transform (DWT) is a time-consuming kernel in many multimedia applications such as JPEG2000 and MPEG-4. The 2D DWT consists of horizontal filtering along the rows followed by vertical filtering along the columns. The vertical filtering is easy to vectorize (assuming row-major order), but to vectorize the horizontal filtering many overhead instructions are required. In this paper we propose some SIMD architectural enhancements, such as the MAC operation, extended subwords, and the matrix register file technique, to develop high-performance implementations of the 2D DWT on SIMD architectures. The MAC operation performs four 32-bit single-precision floating-point multiplications with accumulation. The matrix register file allows to load data stored consecutively in memory to a column of the register file, where a column corresponds to corresponding subwords of different registers. These techniques avoid the need of data rearrangement instructions. In addition, in order to avoid data type conversion instructions, the extended subword technique is applied for the (5, 3) lifting transform. Extended subwords use registers that are wider than the packed format used to store the data. These techniques provide speedups of up to 2.90 and 1.32 for the (5, 3) lifting and Daub-4 transforms, respectively.
Keywords :
data flow graphs; discrete wavelet transforms; filtering theory; floating point arithmetic; image processing; parallel processing; 2D discrete wavelet transform; 32-bit single-precision floating-point multiplications; Daub-4 transform; MAC; SIMD architectural enhancements; data flow graph; data rearrangement instructions; extended subword technique; horizontal filtering; image processing; lifting transform; matrix register file; time-consuming kernel; vectorization; vertical filtering; Design engineering; Discrete cosine transforms; Discrete transforms; Discrete wavelet transforms; Encoding; Filtering; Hardware; MPEG 4 Standard; Registers; Transform coding; DWT; Parallelization; SIMD Architectures;
Conference_Titel :
Digital System Design, Architectures, Methods and Tools, 2009. DSD '09. 12th Euromicro Conference on
Conference_Location :
Patras
Print_ISBN :
978-0-7695-3782-5
DOI :
10.1109/DSD.2009.189