Title :
A hybrid feature selection algorithm for web document clustering
Author :
Benghabrit, Asmaa ; Ouhbi, Brahim ; Zemmouri, El Moukhtar ; Frikh, Bouchra ; Behja, Hicham
Author_Institution :
LM2I Lab., Moulay Ismail Univ., Meknès, Morocco
Abstract :
Knowing that not all the features in a dataset are important since some are redundant or irrelevant, the use of feature selection, an effective dimensionality reduction technique, is essential for web document clustering. For the clustering process, it represents the task of selecting important features for the underlying clusters. Therefore in order to pilot the web document clustering process, we propose a hybrid feature selection algorithm that selects simultaneously the most statistical and semantic informative features through a weighting model. The clustering process selects relevant features and performs document clustering iteratively until stability. The experimental results demonstrate the practical aspects of our algorithm and show that it generates more efficient clustering than the one obtained by other existing algorithms.
Keywords :
Internet; feature selection; stability; statistical analysis; Web document clustering process; hybrid feature selection algorithm; semantic informative features; stability; statistical informative features; weighting model; Algorithm design and analysis; Clustering algorithms; Convergence; Feature extraction; Mutual information; Semantics; Vectors; Clustering; Feature selection methods; Performance analysis; Statistical and semantic analysis; Web documents;
Conference_Titel :
Next Generation Networks and Services (NGNS), 2014 Fifth International Conference on
Conference_Location :
Casablanca
Print_ISBN :
978-1-4799-6608-0
DOI :
10.1109/NGNS.2014.6990255