Title :
MixPR-An Approach of Combining Content and Links of Web Page
Author_Institution :
Xi´´an Univ. of Finance & Econ., Xi´´an
Abstract :
Pagerank was used in systems based on hyperlink structure such as Google. TFIDF was widely used in IR systems based on the vector space model (VSM). It was significative to combine the advantages of two systems. In this paper, we set up a new model by using the content of Web pages and the links among pages. We set up the transition probability matrix, which composed of link information and the relevant value of pages with the given query. The relevant value was denoted by TFIDF. We got the MixPR (mixed pagerank) by solving the equation with the coefficient of matrix. In this model, part of the pages, which would be used to compute the TFIDF, had been downloaded from the Internet firstly, and the link information which started from those pages was stored in local server, too. The importance of the page was determined by content and the links. Experimental results showed that the new model worked well, and the precision approached to the result of the TFIDF did.
Keywords :
Internet; information retrieval; search engines; Google; Pagerank; Web page; hyperlink structure; transition probability matrix; vector space model; Content based retrieval; Databases; Delay; Equations; Finance; Internet; Search engines; Web pages; Web search; Web server;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on
Conference_Location :
Haikou
Print_ISBN :
978-0-7695-2874-8
DOI :
10.1109/FSKD.2007.407