• DocumentCode
    1665567
  • Title

    Big Data Pre-processing: A Quality Framework

  • Author

    Taleb, Ikbal ; Dssouli, Rachida ; Serhani, Mohamed Adel

  • Author_Institution
    CIISE, Concordia Univ., Montreal, QC, Canada
  • fYear
    2015
  • Firstpage
    191
  • Lastpage
    198
  • Abstract
    With the abundance of raw data generated from various sources, Big Data has become a preeminent approach in acquiring, processing, and analyzing large amounts of heterogeneous data to derive valuable evidences. The size, speed, and formats in which data is generated and processed affect the overall quality of information. Therefore, Quality of Big Data (QBD) has become an important factor to ensure that the quality of data is maintained at all Big data processing phases. This paper addresses the QBD at the pre-processing phase, which includes sub-processes like cleansing, integration, filtering, and normalization. We propose a QBD model incorporating processes to support Data quality profile selection and adaptation. In addition, it tracks and registers on a data provenance repository the effect of every data transformation happened in the pre-processing phase. We evaluate the data quality selection module using large EEG dataset. The obtained results illustrate the importance of addressing QBD at an early phase of Big Data processing lifecycle since it significantly save on costs and perform accurate data analysis.
  • Keywords
    Big Data; data analysis; EEG dataset; QBD; big data preprocessing; big data processing lifecycle; cleansing process; data analysis; data provenance repository; data quality profile selection; data quality selection module; data transformation; filtering process; heterogeneous data; integration process; normalization process; quality of big data; Accuracy; Big data; Business; Data analysis; Data integration; Distributed databases; Big Data; Data Quality; pre-processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2015 IEEE International Congress on
  • Conference_Location
    New York, NY
  • Print_ISBN
    978-1-4673-7277-0
  • Type

    conf

  • DOI
    10.1109/BigDataCongress.2015.35
  • Filename
    7207219