• DocumentCode
    2572013
  • Title

    STRank: A SiteRank algorithm using semantic relevance and time frequency

  • Author

    Guo, Hongzhi ; Chen, Qingcai ; Wang, Xiaolong ; Wang, Zhiyong ; Wu, Yonghui

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Harbin Inst. of Technol., Shenzhen, China
  • fYear
    2009
  • fDate
    11-14 Oct. 2009
  • Firstpage
    4876
  • Lastpage
    4881
  • Abstract
    Most of the researches on Web information processing are concentrated on the Web pages and the hyperlinks among them. One of the important facts that a Web page is just one building block of the whole Website had been ignored. But the situation is gradually changed in recent years for the needs of Website reputation calculation, the high level Website structure mining etc. It causes the Website ranking become one of the hot research topics and various site ranking algorithms, such as SiteRank, AggregateRank etc., had been proposed. But most of existing Website ranking algorithm just take use of Website link graphs and the content of Websites are usually not put into consideration. It is obviously not enough for a reliable ranking of Websites. To address this issue, this paper introduces two content based features, i.e., semantic relevance and time frequency and proposes a new STRank algorithm based on these two features. We firstly conduct a series of experiments to verify the feasibility of these two factors in site ranking task. Then the semantic relevance is applied in the calculation of transition probability, and the updating frequency of sites is combined into the ranking task. Since traditional Kendall´s ¿ distance and Spearman´s footrule distance is not appropriate for the evaluation of site ranking, we make some modifications accordingly to evaluate Website ranking algorithms. Finally, our experiments show that the STRank algorithm outperforms existing approaches on both effectiveness and efficiency.
  • Keywords
    Web sites; data mining; graph theory; probability; search engines; Kendall´s ¿ distance; STRank; SiteRank algorithm; Spearman´s footrule distance; Web information processing; Web pages; Web site link graphs; Web site ranking; Web site reputation calculation; Web site structure mining; content based features; hyperlinks; semantic relevance; time frequency; transition probability; Algorithm design and analysis; Computer science; Cybernetics; Information processing; Information retrieval; Search engines; Space technology; Time frequency analysis; USA Councils; Web pages; STRank; semantic relevance; site ranking; time frequency; updating frequency;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on
  • Conference_Location
    San Antonio, TX
  • ISSN
    1062-922X
  • Print_ISBN
    978-1-4244-2793-2
  • Electronic_ISBN
    1062-922X
  • Type

    conf

  • DOI
    10.1109/ICSMC.2009.5346321
  • Filename
    5346321