DocumentCode
2334163
Title
Who links to whom: mining linkage between Web sites
Author
Bharat, Krishna ; Chang, Bay-Wei ; Henzinger, Monika ; Ruhl, Matthias
Author_Institution
Google Inc., Mountain View, CA, USA
fYear
2001
fDate
2001
Firstpage
51
Lastpage
58
Abstract
Previous studies of the Web graph structure have focused on the graph structure at the level of individual pages. In actuality the Web is a hierarchically nested graph, with domains, hosts and Web sites introducing intermediate levels of affiliation and administrative control. To better understand the growth of the Web we need to understand its macro-structure, in terms of the linkage between Web sites. We approximate this by studying the graph of the linkage between hosts on the Web. This was done based on snapshots of the Web taken by Google in Oct 1999, Aug 2000 and Jun 2001. The connectivity between hosts is represented by a directed graph, with hosts as nodes and weighted edges representing the count of hyperlinks between pages on the corresponding hosts. We demonstrate how such a "hostgraph" can be used to study connectivity properties of hosts and domains over time, and discuss a modified "copy model" to explain observed link weight distributions as a function of subgraph size. We discuss changes in the Web over time in the size and connectivity of Web sites and country domains. We also describe a data mining application of the hostgraph: a related host finding algorithm which achieves a precision of 0.65 at rank 3
Keywords
data mining; directed graphs; hypermedia; information resources; Web graph structure; Web sites; administrative control; connectivity properties; country domains; data mining application; directed graph; hierarchically nested graph; host finding algorithm; hostgraph; hyperlinks; intermediate levels; macro-structure; modified copy model; observed link weight distributions; subgraph size; weighted edges; Aggregates; Bibliometrics; Citation analysis; Computer science; Couplings; Data mining; Navigation; Uniform resource locators; Web page design; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location
San Jose, CA
Print_ISBN
0-7695-1119-8
Type
conf
DOI
10.1109/ICDM.2001.989500
Filename
989500
Link To Document