• DocumentCode
    3228821
  • Title

    Binary Cybergenre Classification Using Theoretic Feature Measures

  • Author

    Dong, Lie ; Walters, Christine ; Duffy, Jack ; Shepherd, Michael

  • Author_Institution
    Dalhousie Univ., Halifax, NS
  • fYear
    2006
  • fDate
    18-22 Dec. 2006
  • Firstpage
    313
  • Lastpage
    316
  • Abstract
    In this study, we conducted an investigation on automatic genre classification for three common types of Web pages addressing the effect of three theoretic feature selection measures, a range of feature set size, and three machine classifiers on the accuracy of the Web page classification in the context of a set of controlled experiments. Our results are encouraging and we conclude that for binary classification tasks, at least for these Web page genres, it is possible to reach satisfying results with small content-based feature sets generated with a sound feature selection measure and furthermore there is no evidence of interaction between these feature selection measures and the machine classifiers used
  • Keywords
    Internet; classification; feature extraction; information retrieval; search engines; support vector machines; automatic Web page genre classification; binary cybergenre classification; content-based feature sets; machine classifier; theoretic feature selection measure; Automatic control; HTML; Information retrieval; Lifting equipment; Robustness; Search engines; Size control; Size measurement; Uniform resource locators; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    0-7695-2747-7
  • Type

    conf

  • DOI
    10.1109/WI.2006.50
  • Filename
    4061384