• DocumentCode
    3144931
  • Title

    Hyracks: A flexible and extensible foundation for data-intensive computing

  • Author

    Borkar, Vinayak ; Carey, Michael ; Grover, Raman ; Onose, Nicola ; Vernica, Rares

  • Author_Institution
    Comput. Sci. Dept., Univ. of California, Irvine, CA, USA
  • fYear
    2011
  • fDate
    11-16 April 2011
  • Firstpage
    1151
  • Lastpage
    1162
  • Abstract
    Hyracks is a new partitioned-parallel software platform designed to run data-intensive computations on large shared-nothing clusters of computers. Hyracks allows users to express a computation as a DAG of data operators and connectors. Operators operate on partitions of input data and produce partitions of output data, while connectors repartition operators´ outputs to make the newly produced partitions available at the consuming operators. We describe the Hyracks end user model, for authors of dataflow jobs, and the extension model for users who wish to augment Hyracks´ built-in library with new operator and/or connector types. We also describe our initial Hyracks implementation. Since Hyracks is in roughly the same space as the open source Hadoop platform, we compare Hyracks with Hadoop experimentally for several different kinds of use cases. The initial results demonstrate that Hyracks has significant promise as a next-generation platform for data-intensive applications.
  • Keywords
    data handling; parallel programming; DAG; Hyracks end user model; data connectors; data operators; data-intensive computing; open source Hadoop platform; partitioned-parallel software platform; Computational modeling; Computer architecture;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2011 IEEE 27th International Conference on
  • Conference_Location
    Hannover
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4244-8959-6
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2011.5767921
  • Filename
    5767921