• DocumentCode
    2710196
  • Title

    Mining Large Networks with Subgraph Counting

  • Author

    Bordino, Ilaria ; Donato, Debora ; Gionis, Aristides ; Leonardi, Stefano

  • Author_Institution
    Sapienza Univ. di Roma, Rome
  • fYear
    2008
  • fDate
    15-19 Dec. 2008
  • Firstpage
    737
  • Lastpage
    742
  • Abstract
    The problem of mining frequent patterns in networks has many applications, including analysis of complex networks, clustering of graphs, finding communities in social networks, and indexing of graphical and biological databases. Despite this wealth of applications, the current state of the art lacks algorithmic tools for counting the number of subgraphs contained in a large network. In this paper we develop data-stream algorithms that approximate the number of all subgraphs of three and four vertices in directed and undirected networks. We use the frequency of occurrence of all subgraphs to prove their significance in order to characterize different kinds of networks: we achieve very good precision in clustering networks with similar structure. The significance of our method is supported by the fact that such high precision cannot be achieved when performing clustering based on simpler topological properties, such as degree, assortativity, and eigenvector distributions. We have also tested our techniques using swap randomization.
  • Keywords
    data mining; data structures; directed graphs; network theory (graphs); pattern clustering; biological database indexing; complex network; data representation; data-stream algorithm; directed network; frequent pattern mining; graph clustering; graphical database indexing; social network; subgraph counting; swap randomization; undirected network; Biological system modeling; Clustering algorithms; Complex networks; Data mining; Databases; Frequency; Indexing; Information systems; Large-scale systems; Pattern analysis; Streaming algorithms; graph algorithms; network characterization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
  • Conference_Location
    Pisa
  • ISSN
    1550-4786
  • Print_ISBN
    978-0-7695-3502-9
  • Type

    conf

  • DOI
    10.1109/ICDM.2008.109
  • Filename
    4781171