• DocumentCode
    3230337
  • Title

    PageSim: A Novel Link-Based Similarity Measure for the World Wide Web

  • Author

    Lin, Zhenjiang ; King, Irwin ; Lyu, Michael R.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong
  • fYear
    2006
  • fDate
    18-22 Dec. 2006
  • Firstpage
    687
  • Lastpage
    693
  • Abstract
    The requirement for measuring the similarity between Web pages arises in many applications on the Web, such as Web searching engine and Web document classification. According to the unique characteristics of the Web, which are huge, rapidly growing, high dynamic, and untrustworthy, we propose a novel link-based similarity measure called PageSim. Based on the strategy of PageRank score propagation, PageSim is efficient, scalable, stable, and "fairly" robust, and therefore is applicable to the Web. We present intuitions behind the PageSim model, and outline the model with mathematical definitions. We also suggest the pruning technique for efficient computation of PageSim scores, and conduct experiments to illustrate the effectiveness and specialities of PageSim
  • Keywords
    Internet; Web sites; data mining; search engines; PageRank score propagation; PageSim link-based similarity measure; Web document classification; Web pages; Web searching engine; World Wide Web; mathematical definitions; pruning technique; Application software; Books; Computer science; Libraries; Mathematical model; Robustness; Scalability; Search engines; Web pages; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    0-7695-2747-7
  • Type

    conf

  • DOI
    10.1109/WI.2006.127
  • Filename
    4061454