• DocumentCode
    3607812
  • Title

    Bigprovision: a provisioning framework for big data analytics

  • Author

    Huan Li ; Kejie Lu ; Shicong Meng

  • Volume
    29
  • Issue
    5
  • fYear
    2015
  • Firstpage
    50
  • Lastpage
    56
  • Abstract
    In the past few years, big data has attracted significant attention, and many analytics platforms, such as Hadoop, have been developed to enable the analysis of massive data. Nevertheless, it is still very challenging to provision, let alone optimize, a comprehensive system that includes various aspects, from the computing infrastructure to the analytics programs. To tackle this challenge, in this article, we propose a novel provisioning framework, BigProvision, to provision big data analytics systems. The main idea of the framework is to first evaluate and model the performance of different big data analytics approaches, given a set of sample data and various analytics requirements, such as the expected results, budget, response time, and so on. Based on the evaluation and modeling results, BigProvision can generate a provisioning configuration that can be used to configure the whole system for big data analytics. To evaluate the performance of the proposed framework, we develop an experimental prototype that supports three analytics platforms, Hadoop, Spark, and GraphLab. Our experiments show that for the classic PageRank analysis, both GraphLab and Spark can outperform Hadoop under different requirements. Moreover, by modeling the results, our prototype can determine the expected settings, such as the number of machines and network capacity, for the system that shall handle the complete data set. The prototype and experiments demonstrate that the proposed framework has great potential to facilitate the provision and optimization of future big data analytics systems.
  • Keywords
    Big Data; data analysis; parallel processing; Big Data analytics; BigProvision; GraphLab; Hadoop; Spark; provisioning framework; Algorithm design and analysis; Analytical models; Big data; Cloud computing; Computational modeling; Data models;
  • fLanguage
    English
  • Journal_Title
    Network, IEEE
  • Publisher
    ieee
  • ISSN
    0890-8044
  • Type

    jour

  • DOI
    10.1109/MNET.2015.7293305
  • Filename
    7293305