Title :
Evaluation of three dimensionality reduction techniques for document classification
Author :
Luo, Xiao ; Zincir-Heywood, A. Nur
Author_Institution :
Fac. of Comput. Sci., Dalhousie Univ., NS, Canada
Abstract :
High dimensional document collections restrict the choice of data processing methods, especially machine learning methods, which need to calculate the inter-vector distances. The paper describes the development and evaluation of three different dimensionality reduction methods for document representation. Specifically, these methods are latent semantic indexing, random mapping and the two combined together. We are interested in how far these dimensionality reduction methods affect accurate measurement of document categorization. The results show that LSI performs better in terms of the F1-measure; however RM+LSI has a very close performance record with a much lower computational cost.
Keywords :
classification; computational complexity; data structures; information retrieval; learning (artificial intelligence); self-organising feature maps; text analysis; automated text analysis tools; computational complexity; computational cost; data processing; data representation; dimensionality reduction techniques; document categorization; document classification; document representation; hierarchical self organizing feature map architecture; information retrieval; inter-vector distances; latent semantic indexing; machine learning methods; random mapping; Computational efficiency; Computer science; Data processing; Frequency; Indexing; Large scale integration; Learning systems; Machine learning algorithms; Neural networks; Text analysis;
Conference_Titel :
Electrical and Computer Engineering, 2004. Canadian Conference on
Print_ISBN :
0-7803-8253-6
DOI :
10.1109/CCECE.2004.1344986