Title :
Efficient data transfer scheme using word-pair-encoding-based compression for large-scale text-data processing
Author :
Waidyasooriya, Hasitha Muthumala ; Ono, Daisuke ; Hariyama, Masanori ; Kameyama, Michitaka
Author_Institution :
Grad. Sch. of Inf. Sci., Tohoku Univ., Sendai, Japan
Abstract :
Large-scale data processing is very common in many fields such as data-mining, genome mapping, etc. To accelerate such processing, Graphic Accelerator Units (GPU) and FPGAs (Feild-Programmable Gate-Array) are used. However, the large data transfer time between the accelerator and the host computer is a huge performance bottleneck. In this paper, we use a word-pair-encoding method to compress the data down to 25% of its original size. The encoded data can be decoded from any position without decoding the whole data file. For some algorithms, the encoded data can be processed without decoding. Using Burrows-Wheeler algorithm based text search, we show that the data amount and transfer time can be reduced by over 70%.
Keywords :
data compression; data mining; encoding; field programmable gate arrays; graphics processing units; text analysis; Burrows- Wheeler algorithm based text search; FPGA; GPU; data transfer scheme; data-mining; encoded data; field-programmable gate-array; genome mapping; graphic accelerator units; large-scale text-data processing; performance bottleneck; word-pair-encoding-based compression; Arrays; Bioinformatics; Data compression; Data transfer; Encoding; Genomics; Graphics processing units; Succinct data structures; big data; data compression;
Conference_Titel :
Circuits and Systems (APCCAS), 2014 IEEE Asia Pacific Conference on
Conference_Location :
Ishigaki
DOI :
10.1109/APCCAS.2014.7032862