DocumentCode
2985233
Title
Assessing the Significance of Data Mining Results on Graphs with Feature Vectors
Author
Gunnemann, Stephan ; Phuong Dao ; Jamali, Mohsin ; Ester, Martin
Author_Institution
RWTH Aachen Univ., Aachen, Germany
fYear
2012
fDate
10-13 Dec. 2012
Firstpage
270
Lastpage
279
Abstract
Assessing the significance of data mining results is an important step in the knowledge discovery process. While results might appear interesting at a first glance, they can often be explained by already known characteristics of the data. Randomization is an established technique for significance testing, and methods to assess data mining results on vector data or network data have been proposed. In many applications, however, both sources are simultaneously given. Since these sources are rarely independent of each other but highly correlated, naively applying existing randomization methods on each source separately is questionable. In this work, we present a method to assess the significance of mining results on graphs with binary features vectors. We propose a novel null model that preserves correlation information between both sources. Our randomization exploits an adaptive Metropolis sampling and interweaves attribute randomization and graph randomization steps. In thorough experiments, we demonstrate the application of our technique. Our results indicate that while simultaneously using both sources is beneficial, often one source of information is dominant for determining the mining results.
Keywords
data mining; graph theory; sampling methods; vectors; adaptive Metropolis sampling; attribute randomization; binary feature vector; correlation information; data mining result; graph randomization; information source; knowledge discovery process; network data; randomization technique; significance testing; vector data; Clustering algorithms; Correlation; Data mining; Data models; Markov processes; Testing; Vectors; data mining; graph; network; randomization; significance testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location
Brussels
ISSN
1550-4786
Print_ISBN
978-1-4673-4649-8
Type
conf
DOI
10.1109/ICDM.2012.70
Filename
6413896
Link To Document