DocumentCode :
719404
Title :
Document Counting in Compressed Space
Author :
Gagie, Travis ; Hartikainen, Aleksi ; Karkkainen, Juha ; Navarro, Gonzalo ; Puglisi, Simon J. ; Siren, Jouni
Author_Institution :
Dept. of Comput. Sci., Univ. of Helsinki, Helsinki, Finland
fYear :
2015
fDate :
7-9 April 2015
Firstpage :
103
Lastpage :
112
Abstract :
We address the problem of counting the number of strings in a collection where a given pattern appears, which has applications in information retrieval and data mining. Existing solutions are in a theoretical stage. In this pa-per we implement these solutions and explore compressed variants, aiming to reduce data structure size. Our main result is to uncover some unexpected compressibility properties of the fastest known data structure for the problem. By taking advantage of these properties, we can reduce the size of the structure by a factor of 5-400, depending on the dataset.
Keywords :
data mining; document handling; information retrieval; string matching; compressed space; compressibility properties; data mining; data structure size reduction; document counting; information retrieval; string counting; Arrays; Data compression; Data mining; Encoding; Indexes; Information retrieval;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference (DCC), 2015
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Type :
conf
DOI :
10.1109/DCC.2015.55
Filename :
7149267
Link To Document :
بازگشت