DocumentCode :
262056
Title :
A Practical Approach on Cleaning-Up Large Data Sets
Author :
Barat, Marius ; Prelipcean, Dumitru Bogdan ; Gavrilut, Dragos Teodor
Author_Institution :
Bitdefender Lab., ”Al.I. Cuza” Univ., Iaşi, Romania
fYear :
2014
fDate :
22-25 Sept. 2014
Firstpage :
280
Lastpage :
284
Abstract :
In this paper we propose a noise detection system based on similarities between instances. Having a data set with instances that belongs to multiple classes, a noise instance denotes a wrongly classified record. The similarity between different labeled instances is determined computing distances between them using several metrics among the standard ones. In order to ensure that this approach is computational feasible for very large data sets, we compute distances between pairs of different labels instances that have a certain degree of similarity. This speed-up is possible through a new clustering method called BDT Clustering presented within this paper, which is based on a supervised learning algorithm.
Keywords :
data handling; learning (artificial intelligence); pattern classification; pattern clustering; BDT clustering method; computing distances; label instances; large data set cleaning-up; noise detection system; supervised learning algorithm; Clustering algorithms; Clustering methods; Computer science; Machine learning algorithms; Malware; Measurement; Noise; clustering; data mining; decision making; machine learning; noise reduction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2014 16th International Symposium on
Conference_Location :
Timisoara
Print_ISBN :
978-1-4799-8447-3
Type :
conf
DOI :
10.1109/SYNASC.2014.45
Filename :
7034695
Link To Document :
بازگشت