• DocumentCode
    1791650
  • Title

    In unity there is strength: Showcasing a unified big data platform with MapReduce Over both object and file storage

  • Author

    Rui Zhang ; Hildebrand, Dean ; Tewari, Renu

  • Author_Institution
    IBM Res. - Almaden, San Jose, CA, USA
  • fYear
    2014
  • fDate
    27-30 Oct. 2014
  • Firstpage
    960
  • Lastpage
    966
  • Abstract
    Big Data platforms often need to support emerging data sources and applications while accommodating existing ones. Since different data and applications have varying requirements, multiple types of data stores (e.g. file-based and object-based) frequently co-exist in the same solution today without proper integration. Hence cross-store data access, key to effective data analytics, can not be achieved without laborious application re-programming, prohibitively expensive data migration, and/or costly maintenance of multiple data copies. We address this vital issue by introducing a first unified big data platform over heterogeneous storage. In particular, we present a prototype joining Apache Hadoop MapReduce with OpenStack´s open-source object store Swift and IBM´s cluster file system GPFSTM. A sentiment analysis application using 3 months of real Twitter data is employed to test and showcase our prototype. We have found that our prototype achieves 50% data capacity savings, eliminates data migration overhead, offers stronger reliability and enterprise support. Through our case study, we have learned important theoretical lessons concerning performance and reliability, as well as practical ones related to platform configuration. We have also identified several potentially high-impact research directions.
  • Keywords
    Big Data; parallel programming; public domain software; social networking (online); Apache Hadoop MapReduce; GPFS IBM cluster file system; OpenStack open-source object store; Swift; Twitter data; cross-store data access; data analytics; data applications; data capacity savings; data migration overhead elimination; data sources; enterprise support; file-based data stores; heterogeneous storage; object-based data stores; platform configuration; reliability analysis; sentiment analysis application; unified Big Data platform; Big data; Portfolios; Protocols; Prototypes; Reliability; Sentiment analysis; Twitter;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (Big Data), 2014 IEEE International Conference on
  • Conference_Location
    Washington, DC
  • Type

    conf

  • DOI
    10.1109/BigData.2014.7004328
  • Filename
    7004328