DocumentCode :
3603874
Title :
Generalized Context Modeling With Multi-Directional Structuring and MDL-Based Model Selection for Heterogeneous Data Compression
Author :
Wenrui Dai ; Hongkai Xiong ; Jia Wang ; Cheng, Samuel ; Zheng, Yuan F.
Author_Institution :
Dept. of Biomed. Inf., Univ. of California, San Diego, La Jolla, CA, USA
Volume :
63
Issue :
21
fYear :
2015
Firstpage :
5650
Lastpage :
5664
Abstract :
This paper proposes generalized context modeling (GCM) for heterogeneous data compression. The proposed model extends the suffix of predicted subsequences in classic context modeling to arbitrary combinations of symbols in multiple directions. To address the selection of contexts, GCM constructs a model graph with a combinatorial structuring of finite order combination of predicted symbols as its nodes. The estimated probability for prediction is obtained by weighting over a class of context models that contain all the occurrences of nodes in the model graph. Moreover, separable context modeling in each direction is adopted for efficient prediction. To find optimal class of context models for prediction, the normalized maximum likelihood (NML) function is developed to estimate their structures and parameters, especially for heterogeneous data with large sizes. Furthermore, it is refined by context pruning to exclude the redundant models. Such model selection is optimal in the sense of minimum description length (MDL) principle, whose divergence is proven to be consistent with the actual distribution. It is shown that upper bounds of model redundancy for GCM are irrelevant to the size of data. GCM is validated in an extensive field of applications, e.g., Calgary corpus, executable files, and genomic data. Experimental results show that it outperforms most state-of-the-art context modeling algorithms reported.
Keywords :
data compression; graph theory; maximum likelihood estimation; probability; GCM; MDL-based model selection; NML function; context pruning; generalized context modeling; heterogeneous data compression; minimum description length principle; model graph; model redundancy; multidirectional structuring; normalized maximum likelihood function; probability; Adaptation models; Context; Context modeling; Data compression; Data models; Maximum likelihood estimation; Predictive models; Context modeling; heterogeneous data compression; minimum description length; model redundancy; model selection;
fLanguage :
English
Journal_Title :
Signal Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1053-587X
Type :
jour
DOI :
10.1109/TSP.2015.2458784
Filename :
7163359
Link To Document :
بازگشت