DocumentCode
1637534
Title
Sub-atomic field processing for improved web log compression
Author
Deorowicz, Sebastian ; Grabowski, Szymon
Author_Institution
Inst. Informatyki, Politech. HbNeska, Gliwice, Poland
fYear
2008
Firstpage
551
Lastpage
556
Abstract
Web log files, storing user activity on a server, may grow at the pace of hundreds of megabytes a day, or even more, on popular sites. It makes sense to archive old logs, to analyze them further, e.g., for detecting attacks or other server abuse patterns. In this work we present a specialized lossless Apache web log preprocessor and test it with combination of several popular general-purpose compressors. Our method works on individual fields of log data (each storing such information like the client´s IP, date/time, requested file or query, download size in bytes, etc.), and utilizes such compression techniques like finding and extracting common prefixes and suffixes, dictionary -based phrase sequence substitution, move -to-front coding, and more. The test results show the proposed transform improves the average compression ratios 2.64 times in case of gzip and 1.83 times in case of bzip2.
Keywords
Internet; data compression; file servers; dictionary based phrase sequence substitution; general-purpose compressors; improved Web log compression; lossless Apache Web log preprocessor; move-to-front coding; prefixes; server abuse patterns; subatomic field processing; suffixes; user activity; table compression; text compression; web logs;
fLanguage
English
Publisher
ieee
Conference_Titel
Modern Problems of Radio Engineering, Telecommunications and Computer Science, 2008 Proceedings of International Conference on
Conference_Location
Lviv-Slavsko
Print_ISBN
978-966-553-678-9
Type
conf
Filename
5423436
Link To Document