• DocumentCode
    659498
  • Title

    Hourglass: A library for incremental processing on Hadoop

  • Author

    Hayes, Michael ; Shah, Shalin

  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    742
  • Lastpage
    752
  • Abstract
    Hadoop enables processing of large data sets through its relatively easy-to-use semantics. However, jobs are often written inefficiently for tasks that could be computed incrementally due to the burdensome incremental state management for the programmer. This paper introduces Hourglass, a library for developing incremental monoid computations on Hadoop. It runs on unmodified Hadoop and provides an accumulator-based interface for programmers to store and use state across successive runs; the framework ensures that only the necessary subcomputations are performed. It is successfully used at LinkedIn, one of the largest online social networks, for many use cases in dashboarding and machine learning. Hourglass is open source and freely available.
  • Keywords
    Big Data; public domain software; social networking (online); software libraries; Hourglass; LinkedIn; accumulator-based interface; dashboarding; incremental monoid computations; incremental processing library; machine learning; online social networks; unmodified Hadoop; Clocks; Complexity theory; Computational modeling; Databases; Libraries; LinkedIn; Programming;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691647
  • Filename
    6691647