Title :
A Feature Reduction Technique for Improved Web Page Clustering
Author :
Mohamed, E.A.-H. ; El-Beltagy, Samhaa R. ; El-Gamal, Salwa
Author_Institution :
Dept. of Comput. Sci., Cairo Univ.
Abstract :
This paper presents a new approach for text feature reduction that can be used to speed up Web page clustering. The technique is based on using a classified corpus in order to build a dictionary that captures the importance of various terms in different categories. The dictionary is then used to translate an input document´s feature vector into a smaller one. Two experiments carried out in order to evaluate this technique are also presented. The evaluation results show that when used, the presented technique results in much faster and more accurate clustering, than when it is not. They also show that despite being simpler, the presented technique can give results comparable to those of currently widely used feature reduction techniques
Keywords :
Internet; pattern classification; pattern clustering; text analysis; Web page clustering; classified corpus; dictionary; text feature reduction; Computer science; Dictionaries; Frequency; Independent component analysis; Indexing; Information retrieval; Large scale integration; Matrices; Noise reduction; Web pages;
Conference_Titel :
Innovations in Information Technology, 2006
Conference_Location :
Dubai
Print_ISBN :
1-4244-0674-9
Electronic_ISBN :
1-4244-0674-9
DOI :
10.1109/INNOVATIONS.2006.301930