• DocumentCode
    1922131
  • Title

    Neural networks for web page classification based on augmented PCA

  • Author

    Selamat, Ali ; Omatu, Sigeru

  • Author_Institution
    Graduate Sch. of Eng., Osaka Prefecture Univ., Sakai, Japan
  • Volume
    3
  • fYear
    2003
  • fDate
    20-24 July 2003
  • Firstpage
    1792
  • Abstract
    Automatic categorization is the only viable method to deal with the scaling problem of the World Wide Web (WWW). In this paper, we propose a news web page classification method (WPCM). The WPCM uses a neural network with inputs obtained by both the principal components and class profile-based features (CPBF). Each news web page is represented by the term-weighting scheme. As the number of unique words in the collection set is big, the principal component analysis (PCA) has been used to select the most relevant features for the classification. Then the final output of the PCA is augmented with the feature vectors from the class-profile which contains the most regular words in each class before feeding them to the neural networks. We have manually selected the most regular words that exist in each class and weighted them using an entropy weighting scheme. The fixed number of regular words from each class will be used as a feature vectors together with the reduced principal components from the PCA. These feature vectors are then used as the input to the neural networks for classification. The experimental evaluation demonstrates that the WPCM method provides acceptable classification accuracy with the sports news datasets.
  • Keywords
    Internet; Web sites; classification; feature extraction; neural nets; principal component analysis; vectors; augmented PCA; automatic categorization; datasets; entropy weighting method; feature vectors; neural networks; principal component analysis; term weighting scheme; web page classification method; Bayesian methods; Cellular neural networks; Entropy; Indexing; Neural networks; Principal component analysis; Telephony; Web pages; Web sites; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 2003. Proceedings of the International Joint Conference on
  • ISSN
    1098-7576
  • Print_ISBN
    0-7803-7898-9
  • Type

    conf

  • DOI
    10.1109/IJCNN.2003.1223679
  • Filename
    1223679