DocumentCode
2230455
Title
A Simple and Fast Term Selection Procedure for Text Clustering
Author
Gonzaga, Luiz ; Grivet, Marco ; TerezaVasconcelos, A.
Author_Institution
Laboratorio Nacional de Computacao Cientifica, Rio de Janeiro
fYear
2007
fDate
20-24 Oct. 2007
Firstpage
777
Lastpage
781
Abstract
Text clustering is a theme that is receiving considerable attention nowadays in areas such as text mining and information retrieval. A starting point for clustering methods applied on unstructured document collection is the creation of a vector-space model usually known as bag-ofwords model [1J. Documents are then usually described by a matrix which happens to be huge and extremely sparse which is due to the exceeding number of terms describing the set of documents. Although several techniques can be employed to reduce this number, the final figure is still high thus leading to a feature space of high dimensionality. This paper presents a simple procedure that not only considerably reduces the dimensionality of the feature space and hence the processing time, but also produces clustering performances comparable or even better when confronted with the full set of terms.
Keywords
data mining; data reduction; information retrieval; pattern clustering; sparse matrices; text analysis; dimensionality reduction; feature space; information retrieval; sparse matrix; text clustering; text mining; unstructured document collection; vector-space model; Abstracts; Broadcasting; Clustering algorithms; Clustering methods; Frequency; Information retrieval; Intelligent systems; Neodymium; Sparse matrices; Text mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Systems Design and Applications, 2007. ISDA 2007. Seventh International Conference on
Conference_Location
Rio de Janeiro
Print_ISBN
978-0-7695-2976-9
Type
conf
DOI
10.1109/ISDA.2007.15
Filename
4389702
Link To Document