DocumentCode
1665567
Title
Big Data Pre-processing: A Quality Framework
Author
Taleb, Ikbal ; Dssouli, Rachida ; Serhani, Mohamed Adel
Author_Institution
CIISE, Concordia Univ., Montreal, QC, Canada
fYear
2015
Firstpage
191
Lastpage
198
Abstract
With the abundance of raw data generated from various sources, Big Data has become a preeminent approach in acquiring, processing, and analyzing large amounts of heterogeneous data to derive valuable evidences. The size, speed, and formats in which data is generated and processed affect the overall quality of information. Therefore, Quality of Big Data (QBD) has become an important factor to ensure that the quality of data is maintained at all Big data processing phases. This paper addresses the QBD at the pre-processing phase, which includes sub-processes like cleansing, integration, filtering, and normalization. We propose a QBD model incorporating processes to support Data quality profile selection and adaptation. In addition, it tracks and registers on a data provenance repository the effect of every data transformation happened in the pre-processing phase. We evaluate the data quality selection module using large EEG dataset. The obtained results illustrate the importance of addressing QBD at an early phase of Big Data processing lifecycle since it significantly save on costs and perform accurate data analysis.
Keywords
Big Data; data analysis; EEG dataset; QBD; big data preprocessing; big data processing lifecycle; cleansing process; data analysis; data provenance repository; data quality profile selection; data quality selection module; data transformation; filtering process; heterogeneous data; integration process; normalization process; quality of big data; Accuracy; Big data; Business; Data analysis; Data integration; Distributed databases; Big Data; Data Quality; pre-processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location
New York, NY
Print_ISBN
978-1-4673-7277-0
Type
conf
DOI
10.1109/BigDataCongress.2015.35
Filename
7207219
Link To Document