DocumentCode
2480976
Title
Feature Space Transformations in Document Clustering
Author
Csorba, Kristóf ; Vajk, István
Author_Institution
Dept. of Autom. & Appl. Informatics, Budapest Univ. of Technol. & Econ.
fYear
0
fDate
0-0 0
Firstpage
175
Lastpage
179
Abstract
Document clustering is a part of information retrieval, where documents written in natural language are being assigned to different groups based on some criteria. In the current case, documents with similar topics are collected. As there are many methods and additional noise filtering techniques to do this, this paper focuses on the composition of such transformations and on the comparison of the configurations build from a subset of these transformations as tiles of the whole procedure. 5 tile methods (term filtering, frequency quantizing, principal component analysis (PCA), term clustering and document clustering of course) are used. These are compared based on the maximal achieved F-measure and time consumption to find the best composition
Keywords
document handling; information retrieval; pattern clustering; principal component analysis; PCA; document clustering; document collection; document retrieval; feature space transformation; frequency quantization; information retrieval; natural language; noise filtering technique; principal component analysis; term clustering; term filtering; Automation; Filtering; Frequency; Informatics; Information retrieval; Natural languages; Principal component analysis; Space technology; Text analysis; Tiles;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Engineering Systems, 2006. INES '06. Proceedings. International Conference on
Conference_Location
London
Print_ISBN
0-7803-9708-8
Type
conf
DOI
10.1109/INES.2006.1689364
Filename
1689364
Link To Document