• DocumentCode
    891318
  • Title

    Efficient Approximate Query Processing in Peer-to-Peer Networks

  • Author

    Arai, Benjamin ; Das, Gautam ; Gunopulos, Dimitrios ; Kalogeraki, Vana

  • Author_Institution
    Univ. of California, Riverside
  • Volume
    19
  • Issue
    7
  • fYear
    2007
  • fDate
    7/1/2007 12:00:00 AM
  • Firstpage
    919
  • Lastpage
    933
  • Abstract
    Peer-to-peer (P2P) databases are becoming prevalent on the Internet for distribution and sharing of documents, applications, and other digital media. The problem of answering large-scale ad hoc analysis queries, for example, aggregation queries, on these databases poses unique challenges. Exact solutions can be time consuming and difficult to implement, given the distributed and dynamic nature of P2P databases. In this paper, we present novel sampling-based techniques for approximate answering of ad hoc aggregation queries in such databases. Computing a high-quality random sample of the database efficiently in the P2P environment is complicated due to several factors: the data is distributed (usually in uneven quantities) across many peers, within each peer, the data is often highly correlated, and, moreover, even collecting a random sample of the peers is difficult to accomplish. To counter these problems, we have developed an adaptive two-phase sampling approach based on random walks of the P2P graph, as well as block-level sampling techniques. We present extensive experimental evaluations to demonstrate the feasibility of our proposed solution.
  • Keywords
    Internet; distributed databases; peer-to-peer computing; query processing; random processes; sampling methods; Internet; P2P graph; ad hoc aggregation query; adaptive two-phase sampling; approximate answering; approximate query processing; block-level sampling; peer-to-peer database; peer-to-peer network; random sampling; Computer networks; Data analysis; Distributed databases; IP networks; Intrusion detection; Music information retrieval; Peer to peer computing; Query processing; Sampling methods; Temperature sensors; Approximation methods; computer networks; database systems; distributed database query processing; distributed databases; distributed estimation; distributed systems.;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2007.1064
  • Filename
    4216308