• DocumentCode
    2182533
  • Title

    Profiling linked open data with ProLOD

  • Author

    Böhm, Christoph ; Naumann, Felix ; Abedjan, Ziawasch ; Fenz, Dandy ; Grütze, Toni ; Hefenbrock, Daniel ; Pohl, Matthias ; Sonnabend, David

  • Author_Institution
    Hasso-Plattner-Inst., Potsdam, Germany
  • fYear
    2010
  • fDate
    1-6 March 2010
  • Firstpage
    175
  • Lastpage
    178
  • Abstract
    Linked open data (LOD), as provided by a quickly growing number of sources constitutes a wealth of easily accessible information. However, this data is not easy to understand. It is usually provided as a set of (RDF) triples, often enough in the form of enormous files covering many domains. What is more, the data usually has a loose structure when it is derived from end-user generated sources, such as Wikipedia. Finally, the quality of the actual data is also worrisome, because it may be incomplete, poorly formatted, inconsistent, etc. To understand and profile such linked open data, traditional data profiling methods do not suffice. With ProLOD, we propose a suite of methods ranging from the domain level (clustering, labeling), via the schema level (matching, disambiguation), to the data level (data type detection, pattern detection, value distribution). Packaged into an interactive, web-based tool, they allow iterative exploration and discovery of new LOD sources. Thus, users can quickly gauge the relevance of the source for the problem at hand (e.g., some integration task), focus on and explore the relevant subset.
  • Keywords
    iterative methods; meta data; LOD; ProLOD; RDF; Web based tool; data profiling methods; data type detection; iterative exploration; linked open data; pattern detection; profiling linked open data; value distribution; Data analysis; Data visualization; Labeling; Ontologies; Packaging; Pattern matching; Prototypes; Resource description framework; Semantic Web; Wikipedia;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on
  • Conference_Location
    Long Beach, CA
  • Print_ISBN
    978-1-4244-6522-4
  • Electronic_ISBN
    978-1-4244-6521-7
  • Type

    conf

  • DOI
    10.1109/ICDEW.2010.5452762
  • Filename
    5452762