• DocumentCode
    3387384
  • Title

    A Feature Reduction Technique for Improved Web Page Clustering

  • Author

    Mohamed, E.A.-H. ; El-Beltagy, Samhaa R. ; El-Gamal, Salwa

  • Author_Institution
    Dept. of Comput. Sci., Cairo Univ.
  • fYear
    2006
  • fDate
    Nov. 2006
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    This paper presents a new approach for text feature reduction that can be used to speed up Web page clustering. The technique is based on using a classified corpus in order to build a dictionary that captures the importance of various terms in different categories. The dictionary is then used to translate an input document´s feature vector into a smaller one. Two experiments carried out in order to evaluate this technique are also presented. The evaluation results show that when used, the presented technique results in much faster and more accurate clustering, than when it is not. They also show that despite being simpler, the presented technique can give results comparable to those of currently widely used feature reduction techniques
  • Keywords
    Internet; pattern classification; pattern clustering; text analysis; Web page clustering; classified corpus; dictionary; text feature reduction; Computer science; Dictionaries; Frequency; Independent component analysis; Indexing; Information retrieval; Large scale integration; Matrices; Noise reduction; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Innovations in Information Technology, 2006
  • Conference_Location
    Dubai
  • Print_ISBN
    1-4244-0674-9
  • Electronic_ISBN
    1-4244-0674-9
  • Type

    conf

  • DOI
    10.1109/INNOVATIONS.2006.301930
  • Filename
    4085445