Title :
Building a new taxonomy for data discretization techniques
Author :
Bakar, Afarulrazi Abu ; Othman, Zulaiha Ali ; Shuib, Nor Liyana Mohd
Author_Institution :
Data Min. & Optimization Res. Group, Univ. Kebangsaan Malaysia, Bangi, Malaysia
Abstract :
Data preprocessing is an important step in data mining. It is used to resolve various types of problem in a large dataset in order to produce quality data. It consists of four steps, namely, data cleaning, integration, reduction and transformation. The literature shows that each preprocessing step consists of various techniques. In order to develop quality data, a data miner must decide the most appropriate techniques in every step of data preprocessing. In this study, we focus on data reduction particularly data discretization as one important data preprocessing step. Data reduction involves reducing the data distribution by reducing the range of continuous data into a range of values or categories. Data discretization plays a major role in reducing the attribute intervals of data values. Finding an appropriate number of discrete values will improve the performance of the data mining modelling, particularly in terms of classification accuracy. This paper proposes four levels of data discretization taxonomy as follows: hierarchical and non-hierarchical; splitting, merging and combination; supervised and unsupervised combinations; and binning, statistic, entropy and other related techniques. The taxonomy is developed based on a detailed review of previous discretization techniques. More than fifty techniques are investigated, and the structure of the discretization approach is outlined. Guidelines on how to use the proposed taxonomy are also discussed.
Keywords :
data mining; data reduction; data cleaning; data discretization techniques; data distribution reduction; data integration; data mining; data preprocessing; data transformation; Artificial intelligence; Cleaning; Computer science; Data mining; Data preprocessing; Entropy; Information technology; Merging; Statistics; Taxonomy; Data Discretization; Data Mining; Data Preprocessing;
Conference_Titel :
Data Mining and Optimization, 2009. DMO '09. 2nd Conference on
Conference_Location :
Kajand
Print_ISBN :
978-1-4244-4944-6
DOI :
10.1109/DMO.2009.5341896