• DocumentCode
    2600668
  • Title

    Uniform Data Sampling from a Peer-to-Peer Network

  • Author

    Datta, Soupayan ; Kargupta, H.

  • Author_Institution
    Dept. of Comput. Sci. & Electr. Eng., Univ. of Maryland, Baltimore, MD, USA
  • fYear
    2007
  • fDate
    25-27 June 2007
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Uniform random sample is often useful in analyzing data. Usually taking a uniform sample is not a problem if the entire data resides in one location. However, if the data is distributed in a peer-to-peer (P2P) network with different amount of data in different peers, collecting a uniform sample of data becomes a challenging task. A random sampling can be performed using random-walk, but due to varying degrees of connectivity and different sizes of data owned by each peer, this random walk gives a biased sample. In this paper, we propose a random walk-based sampling algorithm that can be used to sample data tuples uniformly from a large, unstructured P2P network. We model the random walk as a Markov chain and derive conditions to bound the length of the random walk necessary to achieve uniformity. A formal communication analysis shows logarithmic communication cost to discover a uniform data sample.
  • Keywords
    Markov processes; data analysis; peer-to-peer computing; sampling methods; Markov chain; data analysis; data tuples; formal communication analysis; peer-to-peer network; random walk-based sampling algorithm; uniform data sampling; unstructured P2P network; Distributed databases; Eigenvalues and eigenfunctions; Equations; Markov processes; Nickel; Peer to peer computing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems, 2007. ICDCS '07. 27th International Conference on
  • Conference_Location
    Toronto, ON
  • ISSN
    1063-6927
  • Print_ISBN
    0-7695-2837-3
  • Type

    conf

  • DOI
    10.1109/ICDCS.2007.6238553
  • Filename
    6238553