Title :
Text clustering using statistical and semantic data
Author :
Benghabrit, Asmaa ; Ouhbi, Brahim ; Behja, Hicham ; Frikh, Bouchra
Author_Institution :
LM2I Lab., Moulay Ismail Univ., Meknès, Morocco
Abstract :
The explosive growth of information stored in unstructured texts created a great demand for new and powerful tools to acquire useful information, such as text mining. Document clustering is one of its the powerful methods and by which document retrieval, organization and summarization can be achieved. However, it represents a challenge when dealing with a big number of data due to high dimensionality of the feature space and to the semantic correlation between features. In this paper, we propose a new sequential document clustering algorithm that uses a statistical and semantic feature selection methods. The semantic process was proposed to improve the frequency mechanism with the semantic relations of the text documents. The proposed algorithm selects iteratively relevant features and performs clustering until convergence. To evaluate its performance, experiments on two corpora have been conducted. The obtained results show that the performance of our algorithm is superior to that obtained by the existing algorithms.
Keywords :
data mining; information retrieval; organisational aspects; pattern clustering; statistical analysis; text analysis; document organization; document retrieval; document summarization; semantic correlation; semantic data; semantic feature selection methods; semantic process; sequential document clustering algorithm; statistical data; statistical feature selection methods; text clustering; text documents; text mining; unstructured texts; Algorithm design and analysis; Clustering algorithms; Convergence; Mutual information; Semantics; Text mining; Vectors; Text mining; clustering; feature selection methods; performance analysis;
Conference_Titel :
Computer and Information Technology (WCCIT), 2013 World Congress on
Conference_Location :
Sousse
Print_ISBN :
978-1-4799-0460-0
DOI :
10.1109/WCCIT.2013.6618782