DocumentCode :
597767
Title :
An enhanced approach on handling missing values using bagging k-NN imputation
Author :
Kumutha, V. ; Palaniammal, S.
Author_Institution :
Dept. of Comput. Sci., D.J. Acad. for Manage. Excellence, Coimbatore, India
fYear :
2013
fDate :
4-6 Jan. 2013
Firstpage :
1
Lastpage :
8
Abstract :
Researchers in the database community have aroused great interest in handling high dimensional data sets for the past decades. Today´s business captures inundate sets of data which includes digital documents, web pages-customer databases, hyper-spectral imagery, social networks, gene arrays, proteomics data, neurobiological signals, high dimensional dynamical systems, sensor networks, financial transactions and traffic statistics thereby generating massive high dimensional datasets. DNA microarray paves methods in identifying different expression levels of thousands of genes during biological process. The problem with microarrays is to measure gene expression from thousands of genes (features) from only tens of hundreds of samples. Microarray data often contain several missing values that may affect subsequent analysis. In this paper, a novel approach on imputation using k-NN with bagging method is proposed to handle missing value. The experimental result shows that the proposed method outperforms other methods in terms of distance and density of clusters. The proposed approach has enhanced the performance of traditional k-NN impute using bagging method.
Keywords :
biology computing; data handling; genetics; lab-on-a-chip; learning (artificial intelligence); pattern clustering; DNA microarray; Web pages-customer databases; bagging k-NN imputation; bagging method; biological process; business; cluster density; cluster distance; database community; digital documents; financial transactions; gene arrays; genes expression levels; high dimensional data sets; high dimensional dynamical systems; hyper-spectral imagery; microarray data; missing values handling; neurobiological signals; proteomics data; sensor networks; social networks; traffic statistics; Bagging; Classification algorithms; Clustering algorithms; Computers; Correlation; Databases; Gene expression; bagging; clustering; microarray; missing value;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Communication and Informatics (ICCCI), 2013 International Conference on
Conference_Location :
Coimbatore
Print_ISBN :
978-1-4673-2906-4
Type :
conf
DOI :
10.1109/ICCCI.2013.6466301
Filename :
6466301
Link To Document :
بازگشت