Title :
Compressing Floating-Point Number Stream for Numerical Applications
Author :
Tomari, Hisanobu ; Inaba, Mary ; Hiraki, Kei
Author_Institution :
Grad. Sch. of Inf. Sci. & Technol., Univ. of Tokyo, Tokyo, Japan
Abstract :
A cluster of commodity computers and general-purpose computers with accelerators such as GPGPUs are now common platforms to solve computationally intensive tasks like scientific simulations. Both technologies provide users with high performance at relatively low cost. However, the low bandwidth of interconnect compared to the computing performance hinders efficient operation of both cluster and accelerator in the case of many algorithms that require heavy data transmission. For clusters the network is one of the major performance bottlenecks, and for accelerators the peripheral bus to transfer data from host to the memory on the accelerator card is. In this paper, we propose a method of accelerating the performance of floating-point intensive algorithms by compressing the floating point number stream. With the efficient software encoder and hardware decoder, the method eliminates redundancy in the exponential part in the array of numbers on the stream and compacts the entire array to 82.8% of its original size at theoretical limit. The compression ratio is better than Gzip or Bzip2 for floating point numbers. The reduction in communication time directly leads to the reduction in total application running time for programs whose processing time is largely dominated by communication performance. We implemented a high-speed decoder using FPGA that operates at over 6 GB/s. We estimated the application performance using FFT and matrix multiplication on a cluster and the GRAPE-DR accelerator respectively, and our approach is useful in both configurations.
Keywords :
bandwidth allocation; coprocessors; data communication; data compression; fast Fourier transforms; field programmable gate arrays; floating point arithmetic; general purpose computers; integrated circuit interconnections; matrix multiplication; peripheral interfaces; redundancy; Bzip2; FFT; FPGA; GPGPU; GRAPE-DR accelerator; Gzip; accelerator card; accelerators; commodity computers; communication time; compressing floating-point number stream; compression ratio; computationally intensive tasks; computing performance; floating point number stream; floating point numbers; floating-point intensive algorithms; general-purpose computers; hardware decoder; heavy data transmission; high-speed decoder; low bandwidth interconnect; matrix multiplication; numerical applications; peripheral bus; redundancy; scientific simulations; software encoder; compression; data transmission; floating-point;
Conference_Titel :
Networking and Computing (ICNC), 2010 First International Conference on
Conference_Location :
Higashi-Hiroshima
Print_ISBN :
978-1-4244-8918-3
Electronic_ISBN :
978-0-7695-4277-5
DOI :
10.1109/IC-NC.2010.24