DocumentCode :
2851313
Title :
Correlation preserving discretization
Author :
Mehta, Sameep ; Parthasarathy, Srinivasan ; Yang, Hui
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., USA
fYear :
2004
fDate :
1-4 Nov. 2004
Firstpage :
479
Lastpage :
482
Abstract :
Discretization is a crucial preprocessing primitive for a variety of data warehousing and mining tasks. In this article we present a novel PCA-based unsupervised algorithm for the discretization of continuous attributes in multivariate datasets. The algorithm leverages the underlying correlation structure in the dataset to obtain the discrete intervals, and ensures that the inherent correlations are preserved. The approach also extends easily to datasets containing missing values. We demonstrate the efficacy of the approach on real datasets and as a preprocessing step for both classification and frequent item set mining tasks. We also show that the intervals are meaningful and can uncover hidden patterns in data.
Keywords :
data mining; data warehouses; principal component analysis; PCA-based unsupervised algorithm; classification; correlation preserving discretization; correlation structure; data mining; data warehousing; frequent item set mining; missing data; multivariate dataset; unsupervised discretization; Classification algorithms; Classification tree analysis; Computer science; Data engineering; Data mining; Data preprocessing; Decision trees; Discrete transforms; Itemsets; Warehousing; Missing Data; Unsupervised Discretization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2004. ICDM '04. Fourth IEEE International Conference on
Print_ISBN :
0-7695-2142-8
Type :
conf
DOI :
10.1109/ICDM.2004.10007
Filename :
1410340
Link To Document :
بازگشت