DocumentCode :
3717437
Title :
Data veracity estimation with ensembling truth discovery methods
Author :
Laure Berti-?quille
Author_Institution :
Qatar Computing Research Institute, Doha, Qatar
fYear :
2015
Firstpage :
2628
Lastpage :
2636
Abstract :
Estimation of data veracity is recognized as one of the grand challenges of big data. Typically, the goal of truth discovery is to determine the veracity of multi-source, conflicting data and return, as outputs, a veracity label and a confidence score for each data value, along with the trustworthiness score of each source claiming it. Although a plethora of methods has been proposed, it is unlikely a technique dominates all others across all data sets. Furthermore, the performance evaluation of the methods entirely depends on the availability of labeled ground truth data (i.e., data whose veracity has been manually checked). In the context of Big Data, acquiring the complete ground truth data is out-of-reach. In this paper, we propose an ensembling method that mitigates the two problems of method selection and ground truth data sparsity. Our approach combines the results of a set of truth discovery methods and preliminary experiments suggest that it improves the quality performance over the single methods when samples of ground truth data are used.
Keywords :
"Big data","Estimation","Google","Measurement","Conferences","Electronic mail","Context"
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/BigData.2015.7364062
Filename :
7364062
Link To Document :
بازگشت