DocumentCode :
3387384
Title :
A Feature Reduction Technique for Improved Web Page Clustering
Author :
Mohamed, E.A.-H. ; El-Beltagy, Samhaa R. ; El-Gamal, Salwa
Author_Institution :
Dept. of Comput. Sci., Cairo Univ.
fYear :
2006
fDate :
Nov. 2006
Firstpage :
1
Lastpage :
5
Abstract :
This paper presents a new approach for text feature reduction that can be used to speed up Web page clustering. The technique is based on using a classified corpus in order to build a dictionary that captures the importance of various terms in different categories. The dictionary is then used to translate an input document´s feature vector into a smaller one. Two experiments carried out in order to evaluate this technique are also presented. The evaluation results show that when used, the presented technique results in much faster and more accurate clustering, than when it is not. They also show that despite being simpler, the presented technique can give results comparable to those of currently widely used feature reduction techniques
Keywords :
Internet; pattern classification; pattern clustering; text analysis; Web page clustering; classified corpus; dictionary; text feature reduction; Computer science; Dictionaries; Frequency; Independent component analysis; Indexing; Information retrieval; Large scale integration; Matrices; Noise reduction; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Innovations in Information Technology, 2006
Conference_Location :
Dubai
Print_ISBN :
1-4244-0674-9
Electronic_ISBN :
1-4244-0674-9
Type :
conf
DOI :
10.1109/INNOVATIONS.2006.301930
Filename :
4085445
Link To Document :
بازگشت