Title :
Sub-atomic field processing for improved web log compression
Author :
Deorowicz, Sebastian ; Grabowski, Szymon
Author_Institution :
Inst. Informatyki, Politech. HbNeska, Gliwice, Poland
Abstract :
Web log files, storing user activity on a server, may grow at the pace of hundreds of megabytes a day, or even more, on popular sites. It makes sense to archive old logs, to analyze them further, e.g., for detecting attacks or other server abuse patterns. In this work we present a specialized lossless Apache web log preprocessor and test it with combination of several popular general-purpose compressors. Our method works on individual fields of log data (each storing such information like the client´s IP, date/time, requested file or query, download size in bytes, etc.), and utilizes such compression techniques like finding and extracting common prefixes and suffixes, dictionary -based phrase sequence substitution, move -to-front coding, and more. The test results show the proposed transform improves the average compression ratios 2.64 times in case of gzip and 1.83 times in case of bzip2.
Keywords :
Internet; data compression; file servers; dictionary based phrase sequence substitution; general-purpose compressors; improved Web log compression; lossless Apache Web log preprocessor; move-to-front coding; prefixes; server abuse patterns; subatomic field processing; suffixes; user activity; table compression; text compression; web logs;
Conference_Titel :
Modern Problems of Radio Engineering, Telecommunications and Computer Science, 2008 Proceedings of International Conference on
Conference_Location :
Lviv-Slavsko
Print_ISBN :
978-966-553-678-9