DocumentCode :
2830441
Title :
Semantically Rich Spaces for Document Clustering
Author :
Basili, Roberto ; Marocco, Paolo ; Milizia, Daniele
Author_Institution :
Dept. of Comput. Sci., Rome Univ., Rome
fYear :
2008
fDate :
1-5 Sept. 2008
Firstpage :
43
Lastpage :
47
Abstract :
Dimensionality reduction techniques address a relevant problem of vector space models that is the size of involved dictionaries. Certain geometrical transformations applied over the original feature space, like the latent semantic analysis (LSA), aim at preserving and discovering semantic relations between documents within small dimensional spaces. In this paper, a linear transformation method, named locality preserving projections (LPP), is evaluated with respect to a document clustering task and results are compared with LSA. LPP is here applied directly on the original space, through an efficient C-based implementation, and different parameterizations are investigated. Experimental results suggest that LPP is an effective technique able to account for the availability of a priori knowledge within an unsupervised learning framework.
Keywords :
data reduction; document handling; information retrieval; pattern clustering; unsupervised learning; dimensionality reduction technique; document clustering; geometrical transformation; information retrieval; latent semantic analysis; linear transformation method; locality preserving projection; semantic relation discovery; unsupervised learning framework; vector space model problem; Databases; Dictionaries; Expert systems; Functional analysis; Independent component analysis; Information retrieval; Large-scale systems; Linear discriminant analysis; Solid modeling; Vectors; Document clustering; Latent Semantic Analysis; Linear embedding; Locality Preserving Projection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database and Expert Systems Application, 2008. DEXA '08. 19th International Workshop on
Conference_Location :
Turin
ISSN :
1529-4188
Print_ISBN :
978-0-7695-3299-8
Type :
conf
DOI :
10.1109/DEXA.2008.109
Filename :
4624689
Link To Document :
بازگشت