شماره ركورد كنفرانس :
3297
عنوان مقاله :
High Performance Implementation of 2D Convolution using Intel’s Advanced Vector Extensions
عنوان به زبان ديگر :
High Performance Implementation of 2D Convolution using Intel’s Advanced Vector Extensions
پديدآورندگان :
Amiri Hossein Department of Computer Engineering Faculty of Engineering University of Guilan Rasht - Iran , Shahbahrami Asadollah Department of Computer Engineering Faculty of Engineering University of Guilan Rasht - Iran
كليدواژه :
AVX2 , SIMDization , Vectorization , 2D convolution , Parallel Programming
عنوان كنفرانس :
نوزدهمين سمپوزيوم بين المللي هوش مصنوعي و پردازش سيگنال
چكيده لاتين :
Convolution is the most important and fundamental
concept in multimedia processing. For example, for digital image
processing 2D convolution is used for different filtering
operations. It has many mathematical operations and is performed
on all image pixels. Therefore, it is almost a compute-intensive
kernel. In order to improve its performance in this paper, we apply
two approaches to vectorize it, broadcasting of coefficients and
repetition of coefficients using Intrinsic Programming Model
(IPM) and AVX technology. Our experimental results on an Intel
Skylake microarchitecture show that the performance of
broadcasting of coefficients is much higher than repetition of
coefficients for different filter sizes and different image sizes. In
addition, in order to evaluate the performance of Compiler
Automatic Vectorization (CAV), and OpenCV library for this
kernel, we use GCC and LLVM compilers. Our experimental
results show that the performance of both IPM implementations
are faster than GCC’s and LLVM auto-vectorizations.