Title :
A Parallel Algorithm for Datacleansing in Incomplete Information Systems Using MapReduce
Author :
Fei Chen ; Lin Jiang
Author_Institution :
Fac. of Sci., Kunming Univ. of Sci. & Technol., Kunming, China
Abstract :
Data cleansing is an important process of data mining. It is the key technology for ensuring the quality of the data. Classical data pre-processing technique has limitation in processing massive data with missing information, and sometimes it can not obtain precise and reasonable results, which leads to low-quality data. To this end, through deep analysis of the classical pre-processing, combining with the MapReduce programming model, A parallel algorithm for data cleansing in incomplete information systems using MapReduce is put forward to process the massive data with missing information. Finally, the new algorithm is applied to incomplete decision information system, and the analysis results show that the new algorithm is effective.
Keywords :
data handling; information systems; parallel algorithms; parallel programming; MapReduce programming model; data cleansing; data mining; incomplete decision information system; parallel algorithm; Algorithm design and analysis; Cleaning; Data mining; Distributed databases; Information systems; Parallel algorithms; Programming; Data cleansing; MapReduce; incomplete information systems; massive data; rough set;
Conference_Titel :
Computational Intelligence and Security (CIS), 2014 Tenth International Conference on
Conference_Location :
Kunming
Print_ISBN :
978-1-4799-7433-7
DOI :
10.1109/CIS.2014.42