• DocumentCode
    3141462
  • Title

    Declarative analysis of noisy information networks

  • Author

    Moustafa, Walaa Eldin ; Namata, Galileo ; Deshpande, Amol ; Getoor, Lise

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Maryland, College Park, MD, USA
  • fYear
    2011
  • fDate
    11-16 April 2011
  • Firstpage
    106
  • Lastpage
    111
  • Abstract
    There is a growing interest in methods for analyzing data describing networks of all types, including information, biological, physical, and social networks. Typically the data describing these networks is observational, and thus noisy and incomplete; it is often at the wrong level of fidelity and abstraction for meaningful data analysis. This has resulted in a growing body of work on extracting, cleaning, and annotating network data. Unfortunately, much of this work is ad hoc and domain-specific. In this paper, we present the architecture of a data management system that enables efficient, declarative analysis of large-scale information networks. We identify a set of primitives to support the extraction and inference of a network from observational data, and describe a framework that enables a network analyst to easily implement and combine new extraction and analysis techniques, and efficiently apply them to large observation networks. The key insight behind our approach is to decouple, to the extent possible, (a) the operations that require traversing the graph structure (typically the computationally expensive step), from (b) the operations that do the modification and update of the extracted network. We present an analysis language based on Datalog, and show how to use it to cleanly achieve such decoupling. We briefly describe our prototype system that supports these abstractions. We include a preliminary performance evaluation of the system and show that our approach scales well and can efficiently handle a wide spectrum of data cleaning operations on network data.
  • Keywords
    DATALOG; data analysis; information networks; Datalog; data analysis; data cleaning operations; data management system; declarative analysis; graph structure; noisy information networks; Cleaning; Data mining; Databases; Noise measurement; Prediction algorithms; Semantics; Syntactics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering Workshops (ICDEW), 2011 IEEE 27th International Conference on
  • Conference_Location
    Hannover
  • Print_ISBN
    978-1-4244-9195-7
  • Electronic_ISBN
    978-1-4244-9194-0
  • Type

    conf

  • DOI
    10.1109/ICDEW.2011.5767619
  • Filename
    5767619