• DocumentCode
    2131892
  • Title

    Temporal Evolution of the UK Web

  • Author

    Bordino, Ilaria ; Boldi, Paolo ; Donato, Debora ; Santini, Massimo ; Vigna, Sebastiano

  • Author_Institution
    Sapienza Univ. di Roma, Rome
  • fYear
    2008
  • fDate
    15-19 Dec. 2008
  • Firstpage
    909
  • Lastpage
    918
  • Abstract
    Recently, a new temporal dataset has been made public: it is made of a series of twelve 100 M pages snapshots of the .uk domain. The Web graphs of the twelve snapshots have been merged into a single time-aware graph that provide constant-time access to temporal information. In this paper we present the first statistical analysis performed on this graph, with the goal of checking whether the information contained in the graph is reliable (i.e. whether it depends essentially on appearance and disappearance of pages and links, or on the crawler behaviour). We perform a number of tests that show that the graph is actually reliable, and provide the first public data on the evolution of the Web that use a large scale and a significant diversity in the sites considered.
  • Keywords
    Internet; data mining; statistical analysis; temporal databases; UK Web; Web graphs; constant-time access; crawler behaviour; statistical analysis; temporal dataset; temporal evolution; Conferences; Data analysis; Data mining; Frequency estimation; Large-scale systems; Performance analysis; Performance evaluation; Testing; Uniform resource locators; Web pages; Temporal-evolution; Web-characterization; Web-evolution;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
  • Conference_Location
    Pisa
  • Print_ISBN
    978-0-7695-3503-6
  • Electronic_ISBN
    978-0-7695-3503-6
  • Type

    conf

  • DOI
    10.1109/ICDMW.2008.88
  • Filename
    4734022