DocumentCode
659466
Title
Sparse Poisson coding for high dimensional document clustering
Author
Chenxia Wu ; Haiqin Yang ; Jianke Zhu ; Jiemi Zhang ; King, Irwin ; Lyu, Michael R.
Author_Institution
Coll. of Comput. Sci., Zhejiang Univ., Hangzhou, China
fYear
2013
fDate
6-9 Oct. 2013
Firstpage
512
Lastpage
517
Abstract
Document clustering plays an important role in large scale textual data analysis, which generally faces with great challenge of the high dimensional textual data. One remedy is to learn the high-level sparse representation by the sparse coding techniques. In contrast to traditional Gaussian noise-based sparse coding methods, in this paper, we employ a Poisson distribution model to represent the word-count frequency feature of a text for sparse coding. Moreover, a novel sparse-constrained Poisson regression algorithm is proposed to solve the induced optimization problem. Different from previous Poisson regression with the family of ℓ1-regularization to enhance the sparse solution, we introduce a sparsity ratio measure which make use of both ℓ1-norm and ℓ2-norm on the learned weight. An important advantage of the sparsity ratio is that it bounded in the range of 0 and 1. This makes it easy to set for practical applications. To further make the algorithm trackable for the high dimensional textual data, a projected gradient descent algorithm is proposed to solve the regression problem. Extensive experiments have been conducted to show that our proposed approach can achieve effective representation for document clustering compared with state-of-the-art regression methods.
Keywords
Poisson distribution; data analysis; gradient methods; learning (artificial intelligence); optimisation; pattern clustering; regression analysis; text analysis; ℓ1-norm; ℓ1-regularization; ℓ2-norm; Gaussian noise-based sparse coding method; Poisson distribution model; gradient descent algorithm; high dimensional document clustering; high-level sparse representation learning; large scale textual data analysis; optimization problem; regression problem; sparse Poisson coding; sparse coding technique; sparse-constrained Poisson regression algorithm; sparsity ratio measure; word-count frequency feature; Algorithm design and analysis; Clustering algorithms; Data models; Encoding; Measurement; Optimization; Vectors; Poisson regression; document clustering; sparse coding;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data, 2013 IEEE International Conference on
Conference_Location
Silicon Valley, CA
Type
conf
DOI
10.1109/BigData.2013.6691615
Filename
6691615
Link To Document