Title :
Development of an extended robust data mining (ERDM) model
Author :
Yang, Le ; Shin, Sangmun ; Choi, Yongsun ; Park, Kyungjin ; Kaewkuekool, Sittichai ; Chantrasa, Ruephuwan ; Lila, Banhan
Author_Institution :
Inje Univ., Kimhae
Abstract :
Most data mining (DM) methods reviewed in literature for the factor selection may obtain a number of input factors associated with the interesting response without providing the detailed information, such as relationship between the input factors and response, statistical inferences, and analysis. These DM methods also may not discuss the robustness of solutions, either by considering data preprocesses for outliers and missing values, or by considering uncontrollable noise factors. In order to address these problems, we propose an extended robust data mining (ERDM) model. The main concerns of this model are three-fold. The proposed ERDM conducts outlier test and expectation maximum (EM) algorithm to carry out the data preprocess. The proposed ERDM then reduces the dimensionality to find the significant factors among a large number of input factors using correlation-based feature selection (CBFS) method and best first search (BFS) algorithm. Finally, the proposed model utilizes the theory of robust design to handle the noise factors using the concept of surrogate variable and the response surface methodology (RSM).
Keywords :
correlation methods; data mining; data reduction; expectation-maximisation algorithm; feature extraction; response surface methodology; statistical testing; tree searching; best first search algorithm; correlation-based feature selection method; data dimensionality reduction; expectation maximum algorithm; extended robust data mining model; noise factor selection; outlier test; response surface methodology; statistical inference; surrogate variable concept; Data analysis; Data engineering; Data mining; Delta modulation; Educational technology; Electronic mail; Filters; Information analysis; Noise robustness; Testing; Correlation-Based Feature Selection (CBFS); Data Mining (DM); Expectation Maximization (EM) Algorithm; Response Surface Methodology (RSM); Robust Design (RD);
Conference_Titel :
Control, Automation and Systems, 2007. ICCAS '07. International Conference on
Conference_Location :
Seoul
Print_ISBN :
978-89-950038-6-2
Electronic_ISBN :
978-89-950038-6-2
DOI :
10.1109/ICCAS.2007.4406581