Title :
An Automatic Discovery Framework of Cross-Source Data Inconsistency for Web Big Data
Author :
Sha Yang;Wei Yu;Yahui Hu;Kai Wang;Jun Wang;Shijun Li
Author_Institution :
Sch. of Comput., Wuhan Univ., Wuhan, China
Abstract :
The vigorous growth of big data has triggered both opportunities and challenges in business and industry. However, Web big data distributed in diverse sources with multiple data structures frequently conflict with each other, i.e. inconsistency in cross-source Web big data. In this paper, we propose a state-of-the-art architecture of auto-discovering inconsistency with Web big data. Our contributions include: (1) we classify the inconsistency features to formalize inconsistency data and establish an algebraic operation system, (2) we propose three algorithms to auto-discover inconsistency, including constraint-based, SDA-based and HPDM-based method and (3) we conduct experiments on real-world dataset to compare aforesaid schemes with Oracle-based inconsistency detection framework. The empirical results show that our methods outperform traditional framework both on accuracy and efficiency under Web big data.
Keywords :
"Big data","Data models","Computers","Data mining","Industries","Algorithm design and analysis","Distributed databases"
Conference_Titel :
Advanced Cloud and Big Data, 2015 Third International Conference on
Print_ISBN :
978-1-4673-8537-4
DOI :
10.1109/CBD.2015.22