Title :
Semantic-based intelligent data clean framework for big data
Author :
Jia Wang ; Zhijun Song ; Qian Li ; Jun Yu ; Fei Chen
Author_Institution :
28th Res. Inst., China Electron. Technol. Group Corp., Nanjing, China
Abstract :
In order to overcome the limitation of existing data cleansing methods working on massive data, in this paper, we propose a generic semantic-based framework using parallelized processing model for effective big data cleansing. We also use an improved Semantic-Based Keyword Matching Algorithm to deal with duplicate data. Experimental results show that this parallelized framework with improved Semantic-Based Keyword Matching Algorithm can identify duplicates with high recall and precision and have a good performance for big data cleansing.
Keywords :
Big Data; big data cleansing methods; parallelized processing model; semantic-based intelligent data clean framework; semantic-based keyword matching algorithm; Big data; Cleaning; Companies; Data models; Encoding; Real-time systems; Semantics; Semantic-based Keyword Matching; big data; data cleansing; parallelized processing; semantic-based framework;
Conference_Titel :
Security, Pattern Analysis, and Cybernetics (SPAC), 2014 International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4799-5352-3
DOI :
10.1109/SPAC.2014.6982731