DocumentCode :
2210412
Title :
Testing the Significance of Patterns in Data with Cluster Structure
Author :
Vuokko, Niko ; Kaski, Petteri
Author_Institution :
Sch. of Sci. & Technol., HUT Aalto Univ., Helsinki, Finland
fYear :
2010
fDate :
13-17 Dec. 2010
Firstpage :
1097
Lastpage :
1102
Abstract :
Clustering is one of the basic operations in data analysis, and the cluster structure of a dataset often has a marked effect on observed patterns in data. Testing whether a data mining result is implied by the cluster structure can give substantial information on the formation of the dataset. We propose a new method for empirically testing the statistical significance of patterns in real-valued data in relation to the cluster structure. The method relies on principal component analysis and is based on the general idea of decomposing the data for the purpose of isolating the null model. We evaluate the performance of the method and the information it provides on various real datasets. Our results show that the proposed method is robust and provides nontrivial information about the origin of patterns in data, such as the source of classification accuracy and the observed correlations between attributes.
Keywords :
data analysis; data mining; pattern clustering; principal component analysis; classification accuracy; cluster structure; data analysis; data mining; dataset formation; nontrivial information; observed correlation; principal component analysis; statistical significance; clustering; principal component analysis; randomization; significance testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2010 IEEE 10th International Conference on
Conference_Location :
Sydney, NSW
ISSN :
1550-4786
Print_ISBN :
978-1-4244-9131-5
Electronic_ISBN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2010.61
Filename :
5694091
Link To Document :
بازگشت