• DocumentCode
    3323175
  • Title

    Automatic Web Page Categorization using Principal Component Analysis

  • Author

    Zhang, Richong ; Shepherd, Michael ; Duffy, Jack ; Watters, Carolyn

  • Author_Institution
    Fac. of Comput. Sci., Dalhousie Univ., Halifax, NS
  • fYear
    2007
  • fDate
    Jan. 2007
  • Firstpage
    73
  • Lastpage
    73
  • Abstract
    Today´s search engines retrieve tens of thousands of Web pages in response to fairly simple query articulations. These pages are retrieved on the basis of the query terms occurring in the Web pages and the popularity of the Web pages as per the link structure of the Web. However, these search engines do not take into account the broader information need of the user, such as the task in which the user is involved. This research investigates the automatic categorization of Web pages using principal component analysis. The research focuses on user tasks that involve searching for Web pages containing health information, education information or shopping information. Initial results are encouraging with recall and precision values slightly in excess of 80%
  • Keywords
    Internet; classification; information needs; information retrieval; principal component analysis; search engines; automatic Web page categorization; information need; information retrieval; principal component analysis; query articulation; search engine; Computer science; Lifting equipment; Principal component analysis; Probability; Search engines; Support vector machine classification; Support vector machines; Uniform resource locators; Videos; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    System Sciences, 2007. HICSS 2007. 40th Annual Hawaii International Conference on
  • Conference_Location
    Waikoloa, HI
  • ISSN
    1530-1605
  • Electronic_ISBN
    1530-1605
  • Type

    conf

  • DOI
    10.1109/HICSS.2007.98
  • Filename
    4076516