• DocumentCode
    611047
  • Title

    Bi-Hadoop: Extending Hadoop to Improve Support for Binary-Input Applications

  • Author

    Xiao Yu ; Bo Hong

  • fYear
    2013
  • fDate
    13-16 May 2013
  • Firstpage
    245
  • Lastpage
    252
  • Abstract
    The MapReduce programming model, along with its open-source implementation - Hadoop - has provided a cost effective solution for many data-intensive applications. Hadoop stores data distributively and exploits data locality by assigning tasks to where data is stored. Many data-intensive applications, however, require two (or more) input data for each of their tasks. Such applications pose significant challenges for Hadoop as the inputs to one task often reside on multiple nodes, and Hadoop is unable to discover data locality in this scenario. This often leads to excessive data transfers and significant degradations in application performance. In this paper, we present Bi-Hadoop, an efficient extension of Hadoop to better support binary-input applications. Bi-Hadoop integrates an easy-to-use user interface, a binary-input aware task scheduler, and a caching subsystem. Extensive experiments show that Bi-Hadoop can significantly improve the execution of binary-input applications by reducing the data transfer overhead, and outperforms existing Hadoop by up to 3.3x.
  • Keywords
    cache storage; data handling; public domain software; scheduling; user interfaces; Bi-Hadoop; MapReduce programming model; application performance degradation; binary-input application execution; binary-input aware task scheduler; caching subsystem; data locality; data storage; data transfer overhead reduction; data-intensive application; open-source implementation; task assignment; user interface; Data transfer; Dispatching; Scheduling algorithms; Sparse matrices; User interfaces; Vectors; Data Locality; Hadoop; MapReduce;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on
  • Conference_Location
    Delft
  • Print_ISBN
    978-1-4673-6465-2
  • Type

    conf

  • DOI
    10.1109/CCGrid.2013.56
  • Filename
    6546099