Title of article
Document clustering using the LSI subspace signature model
Author/Authors
W.Z. Zhu، نويسنده , , R.B. Allen، نويسنده ,
Issue Information
ماهنامه با شماره پیاپی سال 2013
Pages
17
From page
844
To page
860
Abstract
We describe the latent semantic indexing subspace signature model (LSISSM) for semantic content representation of unstructured text. Grounded on singular value decomposition, the model represents terms and documents by the distribution signatures of their statistical contribution across the top-ranking latent concept dimensions. LSISSM matches term signatures with document signatures according to their mapping coherence between latent semantic indexing (LSI) term subspace and LSI document subspace. LSISSM does feature reduction and finds a low-rank approximation of scalable and sparse term-document matrices. Experiments demonstrate that this approach significantly improves the performance of major clustering algorithms such as standard K-means and self-organizing maps compared with the vector space model and the traditional LSI model. The unique contribution ranking mechanism in LSISSM also improves the initialization of standard K-means compared with random seeding procedure, which sometimes causes low efficiency and effectiveness of clustering. A two-stage initialization strategy based on LSISSM significantly reduces the running time of standard K-means procedures.
Keywords
text mining , Knowledge representation , automatic classification
Journal title
Journal of the American Society for Information Science and Technology
Serial Year
2013
Journal title
Journal of the American Society for Information Science and Technology
Record number
994847
Link To Document