DocumentCode :
3425143
Title :
A comparison of dimensionality reduction techniques for text retrieval
Author :
Vinay, Vishwa ; Cox, Ingemar J. ; Wood, Ken ; Milic-Frayling, Natasa
Author_Institution :
Dept. of Comput. Sci., Univ. Coll. London, UK
fYear :
2005
fDate :
15-17 Dec. 2005
Abstract :
The growth of digital information increases the need to build better techniques for automatically storing, organizing and retrieving it. Much of this information is textual in nature and existing representation models struggle to deal with the high dimensionality of the resulting feature space. Techniques like latent semantic indexing address, to some degree, the problem of high dimensionality in information retrieval. However, promising alternatives, like random mapping (RM), have yet to be completely studied in this context. In this paper, we show that despite the attention RM has received in other applications, in the case of text retrieval it is outperformed not only by principal component analysis (PCA) and independent component analysis (ICA) but also by a simple noise reduction algorithm.
Keywords :
data reduction; independent component analysis; information retrieval; principal component analysis; text analysis; digital information; dimensionality reduction; independent component analysis; latent semantic indexing; noise reduction; principal component analysis; random mapping; representation model; text retrieval; Computational efficiency; Computer science; Educational institutions; Independent component analysis; Indexing; Information retrieval; Machine learning; Noise reduction; Organizing; Principal component analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications, 2005. Proceedings. Fourth International Conference on
Print_ISBN :
0-7695-2495-8
Type :
conf
DOI :
10.1109/ICMLA.2005.2
Filename :
1607465
Link To Document :
بازگشت