• DocumentCode
    155320
  • Title

    A novel cloud based elastic framework for big data preprocessing

  • Author

    Dawelbeit, Omer ; McCrindle, Rachel

  • Author_Institution
    Sch. of Syst. Eng., Univ. of Reading, Reading, UK
  • fYear
    2014
  • fDate
    25-26 Sept. 2014
  • Firstpage
    23
  • Lastpage
    28
  • Abstract
    A number of analytical big data services based on the cloud computing paradigm such as Amazon Redshift and Google Bigquery have recently emerged. These services are based on columnar databases rather than traditional Relational Database Management Systems (RDBMS) and are able to analyse massive datasets in mere seconds. This has led many organisations to retain and analyse their massive logs, sensory or marketing datasets, which were previously discarded due to the inability to either store or analyse them. Although these big data services have addressed the issue of big data analysis, the ability to efficiently de-normalise and prepare this data to a format that can be imported into these services remains a challenge. This paper describes and implements a novel, generic and scalable cloud based elastic framework for Big Data Preprocessing (BDP). Since the approach described by this paper is entirely based on cloud computing it is also possible to measure the overall cost incurred by these preprocessing activities.
  • Keywords
    Big Data; cloud computing; data analysis; relational databases; Amazon Redshift; BDP; Google Bigquery; RDBMS; analytical big data services; big data analysis; big data preprocessing; cloud based elastic framework; cloud computing paradigm; columnar databases; marketing datasets; massive logs; relational database management systems; Big data; Cloud computing; Computer science; Educational institutions; Google; Program processors; Runtime;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Electronic Engineering Conference (CEEC), 2014 6th
  • Conference_Location
    Colchester
  • Type

    conf

  • DOI
    10.1109/CEEC.2014.6958549
  • Filename
    6958549