Title :
Combining PPM models using a text mining approach
Author :
Teahan, W.J. ; Harper, David J.
Author_Institution :
Sch. of Comput. & Math. Sci., Robert Gordon Univ., Aberdeen, UK
Abstract :
This paper introduces a novel switching method which can be used to combine two or more PPM models. The work derives from our earlier work on modelling English and text mining, and the approach takes advantage of both to help improve the compression performance significantly. The performance of the combination of models is at least as good as (and in many cases significantly better than) the best performed of the individual models. The paper reviews PPM-based text mining as it underpins the approach taken by the algorithm. It describes how PPM models are combined by applying a novel variation of the Viterbi algorithm. Results are then presented, followed by a discussion of related work, with conclusions
Keywords :
data compression; text analysis; English; PPM models; Viterbi algorithm; compression performance; data compression; switching method; text mining; Data compression; Data mining; Information analysis; Mathematical model; Natural languages; Performance loss; Source coding; Text mining; Uniform resource locators; Viterbi algorithm;
Conference_Titel :
Data Compression Conference, 2001. Proceedings. DCC 2001.
Conference_Location :
Snowbird, UT
Print_ISBN :
0-7695-1031-0
DOI :
10.1109/DCC.2001.917146