Title :
Group-Scheme: SIMD-based compression algorithms for web text data
Author :
Xudong Zhang ; Zhao, Wayne Xin ; Dongdong Shan ; Hongfei Yan
Author_Institution :
Dept. of Comput. Sci. & Technol., Peking Univ., Beijing, China
Abstract :
Compression algorithms have been quite important for data oriented tasks, especially in the era of Big Data. The rapid development of modern processors facilitates us with powerful SIMD instruction sets, which provides an opportunity for better performance. Although SIMD based optimization on compression have been explored in some studies [2, 7], these studies usually focus on modifying the existing algorithms to fit into the SIMD instruction. In this paper, we propose a compression framework with a novel storage layout format, which aims to improve instruction-level parallelizability of compression algorithms. By instantiating the framework, we design a novel compression algorithm family, called Group-Scheme, and present a parallelized version of Group-Scheme, called SIMD-Group-Scheme. We evaluate the proposed algorithms on two public TREC data sets. With very competitive performance on compression ratio and encoding speed, SIMD-Group-Scheme significantly outperforms the implementation without SIMD instructions and state-of-the-art algorithm (i.e. SIMD-G8IU [7]), w.r.t decoding speed.
Keywords :
data compression; indexing; parallel processing; text analysis; SIMD instruction sets; SIMD-based compression algorithms; SIMD-group-scheme; Web text data; compression ratio; encoding speed; instruction-level parallelizability; public TREC data sets; storage layout format; Arrays; Compression algorithms; Decoding; Encoding; Indexes; Layout; Program processors; SIMD; index compression; integer encoding; inverted index;
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
DOI :
10.1109/BigData.2013.6691617