DocumentCode :
628066
Title :
Sparse data for document clustering
Author :
Veritawati, Ionia ; Wasito, Ito ; Mujiono
Author_Institution :
Comput. Sci., Univ. of Indonesia, Depok, Indonesia
fYear :
2013
fDate :
20-22 March 2013
Firstpage :
38
Lastpage :
43
Abstract :
Document clustering which is a part of text mining framework is used to process models and real data collection of cancer documents into several groups. A vector space model of the documents based on their key phrases are formed and called sparse matrix which contains many zero values. A sparse dimensional reduction and several methods of clustering include K-means, Self Organizing and Non-negative Matrices Factorization (NMF) are applied to the data, then the results are compared. Sparse method in dimensional reduction step using Arnoldi Method provides a better result of clustering validity twice more than standard dimensional reduction result.
Keywords :
data mining; matrix decomposition; pattern clustering; self-organising feature maps; sparse matrices; text analysis; Arnoldi method; NMF; cancer document; clustering validity; document clustering; k-means clustering; key phrase; nonnegative matrices factorization; real data collection; self organizing; sparse data; sparse dimensional reduction; sparse matrix; sparse method; text mining framework; vector space model; zero value; Data models; Indexes; Organizing; Principal component analysis; Sparse matrices; Standards; Vectors; arnoldi method; competitive learning; k-means; non-negative matrices factorization; self-organizing; sparse;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Communication Technology (ICoICT), 2013 International Conference of
Conference_Location :
Bandung
Print_ISBN :
978-1-4673-4990-1
Type :
conf
DOI :
10.1109/ICoICT.2013.6574546
Filename :
6574546
Link To Document :
بازگشت