• DocumentCode
    2688479
  • Title

    Watershed: A High Performance Distributed Stream Processing System

  • Author

    De Souza Ramos, Thatyene Louise Alves ; Oliveira, Rodrigo Silva ; De Carvalho, Ana Paula ; Ferreira, Renato Antônio Celso ; Meira, Wagner, Jr.

  • Author_Institution
    Dept. of Comput. Sci., Univ. Fed. de Minas Gerais, Belo Horizonte, Brazil
  • fYear
    2011
  • fDate
    26-29 Oct. 2011
  • Firstpage
    191
  • Lastpage
    198
  • Abstract
    The task of extracting information from datasets that become larger at a daily basis, such as those collected from the web, is an increasing challenge, but also provides more interesting insights and analysis. Current analyses went beyond content and now focus on tracking and understanding users´ relationships and interactions. Such computation is intensive both in terms of the processing demand imposed by the algorithms and also the sheer amount of data that has to handled. In this paper we introduce Watershed, a distributed computing framework designed to support the analysis of very large data streams online and in real-time. Data are obtained from streams by the system´s processing components, transformed, and directed to other streams, creating large flows of information. The processing components are decoupled from each other and their connections are strictly data-driven. They can be dynamically inserted and removed, providing an environment in which it is feasible that different applications share intermediate results or cooperate to a global purpose. Our experiments demonstrate the flexibility in creating a set of data analysis algorithms and their composition into a powerful stream analysis environment.
  • Keywords
    data analysis; distributed processing; Watershed; data analysis algorithms; distributed computing framework; high performance distributed stream processing system; information extraction; online data streams; Computer architecture; Data analysis; Distributed databases; Libraries; Parallel processing; XML; Data-driven architectures; Distributed systems; Dynamic application topology; High-performance computing; Stream processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture and High Performance Computing (SBAC-PAD), 2011 23rd International Symposium on
  • Conference_Location
    Vitoria, Espirito Santo
  • ISSN
    1550-6533
  • Print_ISBN
    978-1-4577-2050-5
  • Type

    conf

  • DOI
    10.1109/SBAC-PAD.2011.31
  • Filename
    6106022