• DocumentCode
    3501568
  • Title

    MSSG: A Framework for Massive-Scale Semantic Graphs

  • Author

    Hartley, Timothy D R ; Catalyurek, Umit ; Özgüner, Füsun ; Yoo, Andy ; Kohn, Scott ; Henderson, Keith

  • Author_Institution
    Dept. of Electr. & Comput. Eng., The Ohio State Univ.
  • fYear
    2006
  • fDate
    25-28 Sept. 2006
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    This paper presents a middleware framework for storing, accessing and analyzing massive-scale semantic graphs. The framework, MSSG, targets scale-free semantic graphs with O(1012) (trillion) vertices and edges. Here, we present the overall architectural design of the framework, as well as a prototype implementation for cluster architectures. The sheer size of these massive-scale semantic graphs prohibits storing the entire graph in memory even on medium- to large-scale parallel architectures. We therefore propose a new graph database, grDB, for the efficient storage and retrieval of large scale-free semantic graphs on secondary storage. This new database supports the efficient and scalable execution of parallel out-of-core graph algorithms which are essential for analyzing semantic graphs of massive size. We have also developed a parallel out-of-core breadth-first search algorithm for performance study. To the best of our knowledge, it is the first of such algorithms reported in the literature. Experimental evaluations on large real-world semantic graphs show that the MSSG framework scales well, and grDB outperforms widely used open-source out-of-core databases, such as BerkeleyDB and MySQL, in the storage and retrieval of scale-free graphs
  • Keywords
    SQL; graph theory; middleware; parallel architectures; tree searching; BerkeleyDB; MySQL; architectural design; cluster architectures; graph database; massive-scale semantic graphs; middleware framework; open-source out-of-core databases; parallel architectures; parallel out-of-core breadth-first search algorithm; parallel out-of-core graph algorithms; prototype implementation; scale-free semantic graphs; Algorithm design and analysis; Computer networks; Information retrieval; Large-scale systems; Middleware; Open source software; Parallel architectures; Proteins; Prototypes; Relational databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing, 2006 IEEE International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1552-5244
  • Print_ISBN
    1-4244-0327-8
  • Electronic_ISBN
    1552-5244
  • Type

    conf

  • DOI
    10.1109/CLUSTR.2006.311857
  • Filename
    4100363