• DocumentCode
    3123812
  • Title

    Integrating and Ranking Uncertain Scientific Data

  • Author

    Detwiler, Landon ; Gatterbauer, Wolfgang ; Louie, Brent ; Suciu, Dan ; Tarczy-Hornoch, Peter

  • Author_Institution
    Comput. Sci. & Eng., Univ. of Washington, Seattle, WA
  • fYear
    2009
  • fDate
    March 29 2009-April 2 2009
  • Firstpage
    1235
  • Lastpage
    1238
  • Abstract
    Mediator-based data integration systems resolve exploratory queries by joining data elements across sources. In the presence of uncertainties, such multiple expansions can quickly lead to spurious connections and incorrect results. The BioRank project investigates formalisms for modeling uncertainty during scientific data integration and for ranking uncertain query results. Our motivating application is protein function prediction. In this paper we show that: (i) explicit modeling of uncertainties as probabilities increases our ability to predict less-known or previously unknown functions (though it does not improve predicting the well-known). This suggests that probabilistic uncertainty models offer utility for scientific knowledge discovery; (ii) small perturbations in the input probabilities tend to produce only minor changes in the quality of our result rankings. This suggests that our methods are robust against slight variations in the way uncertainties are transformed into probabilities; and (iii) several techniques allow us to evaluate our probabilistic rankings efficiently. This suggests that probabilistic query evaluation is not as hard for real-world problems as theory indicates.
  • Keywords
    data integrity; data mining; probability; query processing; scientific information systems; BioRank project; exploratory queries; mediator-based data integration systems; probabilistic query evaluation; probabilistic uncertainty models; scientific knowledge discovery; uncertain scientific data; Biological system modeling; Biomedical informatics; Computer science; Data engineering; Databases; Proteins; Query processing; Sequences; USA Councils; Uncertainty; data integration; probabilistic databases; ranking; top-k; uncertain data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
  • Conference_Location
    Shanghai
  • ISSN
    1084-4627
  • Print_ISBN
    978-1-4244-3422-0
  • Electronic_ISBN
    1084-4627
  • Type

    conf

  • DOI
    10.1109/ICDE.2009.209
  • Filename
    4812509