• DocumentCode
    2985233
  • Title

    Assessing the Significance of Data Mining Results on Graphs with Feature Vectors

  • Author

    Gunnemann, Stephan ; Phuong Dao ; Jamali, Mohsin ; Ester, Martin

  • Author_Institution
    RWTH Aachen Univ., Aachen, Germany
  • fYear
    2012
  • fDate
    10-13 Dec. 2012
  • Firstpage
    270
  • Lastpage
    279
  • Abstract
    Assessing the significance of data mining results is an important step in the knowledge discovery process. While results might appear interesting at a first glance, they can often be explained by already known characteristics of the data. Randomization is an established technique for significance testing, and methods to assess data mining results on vector data or network data have been proposed. In many applications, however, both sources are simultaneously given. Since these sources are rarely independent of each other but highly correlated, naively applying existing randomization methods on each source separately is questionable. In this work, we present a method to assess the significance of mining results on graphs with binary features vectors. We propose a novel null model that preserves correlation information between both sources. Our randomization exploits an adaptive Metropolis sampling and interweaves attribute randomization and graph randomization steps. In thorough experiments, we demonstrate the application of our technique. Our results indicate that while simultaneously using both sources is beneficial, often one source of information is dominant for determining the mining results.
  • Keywords
    data mining; graph theory; sampling methods; vectors; adaptive Metropolis sampling; attribute randomization; binary feature vector; correlation information; data mining result; graph randomization; information source; knowledge discovery process; network data; randomization technique; significance testing; vector data; Clustering algorithms; Correlation; Data mining; Data models; Markov processes; Testing; Vectors; data mining; graph; network; randomization; significance testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2012 IEEE 12th International Conference on
  • Conference_Location
    Brussels
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4673-4649-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2012.70
  • Filename
    6413896