مرکز منطقه ای اطلاع رساني علوم و فناوري - Text Document Preprocessing and Dimension Reduction Techniques for Text Document Clustering

DocumentCode :

3708213

Title :

Text Document Preprocessing and Dimension Reduction Techniques for Text Document Clustering

Author :

Ammar Ismael Kadhim;Yu-N Cheah;Nurul Hashimah Ahamed

Author_Institution :

Sch. of Comput. Sci., Univ. Sains Malaysia, Minden, Malaysia

fYear :

2014

Firstpage :

Lastpage :

Abstract :

Text mining defines generally the process of extracting interesting features (non-trivial) and knowledge from unstructured text documents. Text mining is an interdisciplinary field which depends on information retrieval, data mining, machine learning, parameter statistics and computational linguistics. Standard text mining and retrieval information techniques of text document usually rely on similar categories. An alternative method of retrieving information is clustering documents to preprocess text. The preprocessing steps have a huge effect on the success to extract knowledge. This study implements TF-IDF and singular value decomposition (SVD) dimensionality reduction techniques. The proposed system presents an effective preprocessing and dimensionality reduction techniques which help the document clustering by using k-means algorithm. Finally, the experimental results show that the proposed method enhances the performance of English text document clustering. Simulation results on BBC news and BBC sport datasets show the superiority of the proposed algorithm.

Keywords :

"Clustering algorithms","Text mining","Algorithm design and analysis","Data models","Singular value decomposition","Indexing"

Publisher :

ieee

Conference_Titel :

Artificial Intelligence with Applications in Engineering and Technology (ICAIET), 2014 4th International Conference on

Type :

conf

DOI :

10.1109/ICAIET.2014.21

Filename :

7351815

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3708213