DocumentCode :
1772663
Title :
A hybrid feature selection algorithm for web document clustering
Author :
Benghabrit, Asmaa ; Ouhbi, Brahim ; Zemmouri, El Moukhtar ; Frikh, Bouchra ; Behja, Hicham
Author_Institution :
LM2I Lab., Moulay Ismail Univ., Meknès, Morocco
fYear :
2014
fDate :
28-30 May 2014
Firstpage :
216
Lastpage :
222
Abstract :
Knowing that not all the features in a dataset are important since some are redundant or irrelevant, the use of feature selection, an effective dimensionality reduction technique, is essential for web document clustering. For the clustering process, it represents the task of selecting important features for the underlying clusters. Therefore in order to pilot the web document clustering process, we propose a hybrid feature selection algorithm that selects simultaneously the most statistical and semantic informative features through a weighting model. The clustering process selects relevant features and performs document clustering iteratively until stability. The experimental results demonstrate the practical aspects of our algorithm and show that it generates more efficient clustering than the one obtained by other existing algorithms.
Keywords :
Internet; feature selection; stability; statistical analysis; Web document clustering process; hybrid feature selection algorithm; semantic informative features; stability; statistical informative features; weighting model; Algorithm design and analysis; Clustering algorithms; Convergence; Feature extraction; Mutual information; Semantics; Vectors; Clustering; Feature selection methods; Performance analysis; Statistical and semantic analysis; Web documents;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Next Generation Networks and Services (NGNS), 2014 Fifth International Conference on
Conference_Location :
Casablanca
Print_ISBN :
978-1-4799-6608-0
Type :
conf
DOI :
10.1109/NGNS.2014.6990255
Filename :
6990255
Link To Document :
بازگشت