• DocumentCode
    967759
  • Title

    Distributed Identification of Top-l Inner Product Elements and its Application in a Peer-to-Peer Network

  • Author

    Das, Kamalika ; Bhaduri, Kanishka ; Liu, Kun ; Kargupta, Hillol

  • Author_Institution
    Univ. of Maryland Baltimore County, Baltimore
  • Volume
    20
  • Issue
    4
  • fYear
    2008
  • fDate
    4/1/2008 12:00:00 AM
  • Firstpage
    475
  • Lastpage
    488
  • Abstract
    The inner product measures how closely two feature vectors are related. It is an important primitive for many popular data mining tasks, for example, clustering, classification, correlation computation, and decision tree construction. If the entire data set is available at a single site, then computing the inner product matrix and identifying the top (in terms of magnitude) entries is trivial. However, in many real-world scenarios, data is distributed across many locations and transmitting the data to a central server would be quite communication intensive and not scalable. This paper presents an approximate local algorithm for identifying top-l, inner products among pairs of feature vectors in a large asynchronous distributed environment such as a peer-to-peer (P2P) network. We develop a probabilistic algorithm for this purpose using order statistics and the Hoeffding bound. We present experimental results to show the effectiveness and scalability of the algorithm. Finally, we demonstrate an application of this technique for interest-based community formation in a P2P environment.
  • Keywords
    data mining; peer-to-peer computing; probability; Hoeffding bound; asynchronous distributed environment; distributed identification; inner product elements; inner product matrix; local algorithm; order statistics; peer-to-peer network; probabilistic algorithm; Algorithms for data and knowledge management; Data mining; Knowledge management applications; Mining methods and algorithms; Probabilistic algorithms;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2007.190714
  • Filename
    4378380