• DocumentCode
    169842
  • Title

    Watershed reengineering: Making Streams Programmable

  • Author

    Caetano Rocha, Rodrigo ; Ferreira, Ricardo ; Meira, Wagner ; Guedes, Dorgival

  • Author_Institution
    Dept. of Comput. Sci., Univ. Fed. de Minas Gerais, Belo Horizonte, Brazil
  • fYear
    2014
  • fDate
    22-24 Oct. 2014
  • Firstpage
    120
  • Lastpage
    125
  • Abstract
    Most high-performance data processing (aka big-data) systems allow users to express their computation using abstractions (like map-reduce) that simplify the extraction of parallelism from applications. Most frameworks, however, do not allow users to specify how communication must take place: that element is deeply embedded into the run-time system (RTS), making changes hard to implement. In this work we describe our reengineering of the Watershed system, a framework based on the filter-stream paradigm and focused on continuous stream processing. Like other big-data environments, watershed provided object-oriented abstractions to express computation (filters), but the implementation of streams was an RTS element. By isolating stream functionality into appropriate classes, combination of communication patterns and reuse of common message handling functions (like compression and blocking) become possible. The new architecture even allow the design of new communication patterns, for example, allowing users to choose MPI, TCP or shared memory implementations of communication channels as their problem demand. Applications designed for the new interface showed reductions in code size on the order of 50%and above in some cases, with no significant performance penalty.
  • Keywords
    Big Data; application program interfaces; message passing; object-oriented programming; shared memory systems; MPI; RTS element; TCP; big-data systems; communication channels; communication patterns; continuous stream processing; filter-stream paradigm; high-performance data processing systems; message handling functions; object-oriented abstractions; programmable streams; run-time system; shared memory implementations; stream functionality; watershed reengineering; Computational modeling; Computers; Decoding; Distributed databases; Libraries; Ports (Computers); Training; big data; high-performance computing; parallel programming; programming model; stream processing system;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture and High Performance Computing Workshop (SBAC-PADW), 2014 International Symposium on
  • Conference_Location
    Paris
  • Type

    conf

  • DOI
    10.1109/SBAC-PADW.2014.31
  • Filename
    6972026