Title :
A Big Data Online Cleaning Algorithm Based on Dynamic Outlier Detection
Author :
Yinglong Diao;Ke-Yan Liu;Xiaoli Meng;Xueshun Ye;Kaiyuan He
Author_Institution :
Power Distrib. Dept., China Electr. Power Res. Inst., Beijing, China
Abstract :
To effectively clean the large-scale, mixed and inaccurate monitoring or collective data, reduce the cost of data cache and ensure the consistent deviation detection on timing data of each cycle, a big data online cleaning algorithm based on dynamic outlier detection has been proposed. The data cleaning method is improved by local outliner detection upon density, sampling cluster uniformly dilution Euclidean distance matrix retaining some corrections into next cycle of cleaning, which avoids a sampling causing overall cleaning deviation and reduces amount of calculation within data cleaning stable time, enhancing the speed greatly. Finally, the distributed solutions on online cleaning algorithm based on Hadoop platform.
Keywords :
"Cleaning","Heuristic algorithms","Distributed databases","Euclidean distance","Data mining","Detection algorithms","Big data"
Conference_Titel :
Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2015 International Conference on
DOI :
10.1109/CyberC.2015.68