Title :
High density compression of log files
Author :
Rácz, Balázs ; Lukács, András
Author_Institution :
Inst. of Math., Budapest Univ. of Technol. & Econ., Hungary
Abstract :
Today there is an emerging demand of Internet and network related service to collect the valuable service usage data and process it using data mining methods. In this paper, a generalized scheme for preprocessing and high-density compression of log files is presented. The aim of the method is to provide a base for long-term storage in a form appropriate for direct processing by data mining algorithms. Experiments on real log data show that the differentiated semantic log compression (dslc) methods compress at 2-3%, outperforming general-purpose compression utilities. This paper also demonstrates the flexibility of the pipeline concept by inlaying a field-wise compression algorithm to improve the compression efficiency. The implementation of this scheme was designed for the largest Hungarian Internet content provider.
Keywords :
Internet; data compression; data mining; pipeline processing; program processors; Hungarian Internet; compression efficiency; content provider; data mining; differentiated semantic log compression methods; field-wise compression algorithm; high density compression; log files; pipeline concept; preprocessing scheme; Automation; Compression algorithms; Data mining; Encoding; IP networks; Informatics; Laboratories; Mathematics; Pipelines; Web and internet services;
Conference_Titel :
Data Compression Conference, 2004. Proceedings. DCC 2004
Print_ISBN :
0-7695-2082-0
DOI :
10.1109/DCC.2004.1281533