• DocumentCode
    2182430
  • Title

    Duplicate detection in probabilistic data

  • Author

    Panse, Fabian ; Van Keulen, Maurice ; De Keijzer, Ander ; Ritter, Norbert

  • Author_Institution
    Comput. Sci. Dept., Univ. of Hamburg, Hamburg, Germany
  • fYear
    2010
  • fDate
    1-6 March 2010
  • Firstpage
    179
  • Lastpage
    182
  • Abstract
    Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities.
  • Keywords
    XML; probability; relational databases; XML data; autonomous probabilistic databases; data integration approach; duplicate detection; probabilistic representations; relational data; uncertain data management; Astronomy; Computer science; Couplings; Data models; Electrostatic precipitators; Prototypes; Relational databases; Telescopes; Uncertainty; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on
  • Conference_Location
    Long Beach, CA
  • Print_ISBN
    978-1-4244-6522-4
  • Electronic_ISBN
    978-1-4244-6521-7
  • Type

    conf

  • DOI
    10.1109/ICDEW.2010.5452759
  • Filename
    5452759