• DocumentCode
    2507019
  • Title

    Representing Web graphs

  • Author

    Raghavan, Sriram ; Garcia-Molina, Hector

  • Author_Institution
    Dept. of Comput. Sci., Stanford Univ., CA, USA
  • fYear
    2003
  • fDate
    5-8 March 2003
  • Firstpage
    405
  • Lastpage
    416
  • Abstract
    A Web repository is a large special-purpose collection of Web pages and associated indexes. Many useful queries and computations over such repositories involve traversal and navigation of the Web graph. However, efficient traversal of huge Web graphs containing several hundred million vertices and a few billion edges is a challenging problem. An additional complication is the lack of a schema to describe the structure of Web graphs. As a result, naive graph representation schemes can significantly increase query execution time and limit the usefulness of Web repositories. We propose a novel representation for Web graphs, called an S-Node representation. We demonstrate that S-Node representations are highly space-efficient, enabling in-memory processing of very large Web graphs. In addition, we present detailed experiments that show that S-Node representations can significantly reduce query execution times when compared with other schemes for representing Web graphs.
  • Keywords
    Internet; data mining; data warehouses; database indexing; graph theory; query formulation; query processing; S-Node representation; Web graph navigation; Web graph representation scheme; Web graph traversal; Web pages collection; Web repository; database indexing; query execution time; Collaborative work; Computer science; Indexing; Internet; Natural languages; Navigation; Performance analysis; Proposals; Search engines; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2003. Proceedings. 19th International Conference on
  • Print_ISBN
    0-7803-7665-X
  • Type

    conf

  • DOI
    10.1109/ICDE.2003.1260809
  • Filename
    1260809