DocumentCode
2688479
Title
Watershed: A High Performance Distributed Stream Processing System
Author
De Souza Ramos, Thatyene Louise Alves ; Oliveira, Rodrigo Silva ; De Carvalho, Ana Paula ; Ferreira, Renato Antônio Celso ; Meira, Wagner, Jr.
Author_Institution
Dept. of Comput. Sci., Univ. Fed. de Minas Gerais, Belo Horizonte, Brazil
fYear
2011
fDate
26-29 Oct. 2011
Firstpage
191
Lastpage
198
Abstract
The task of extracting information from datasets that become larger at a daily basis, such as those collected from the web, is an increasing challenge, but also provides more interesting insights and analysis. Current analyses went beyond content and now focus on tracking and understanding users´ relationships and interactions. Such computation is intensive both in terms of the processing demand imposed by the algorithms and also the sheer amount of data that has to handled. In this paper we introduce Watershed, a distributed computing framework designed to support the analysis of very large data streams online and in real-time. Data are obtained from streams by the system´s processing components, transformed, and directed to other streams, creating large flows of information. The processing components are decoupled from each other and their connections are strictly data-driven. They can be dynamically inserted and removed, providing an environment in which it is feasible that different applications share intermediate results or cooperate to a global purpose. Our experiments demonstrate the flexibility in creating a set of data analysis algorithms and their composition into a powerful stream analysis environment.
Keywords
data analysis; distributed processing; Watershed; data analysis algorithms; distributed computing framework; high performance distributed stream processing system; information extraction; online data streams; Computer architecture; Data analysis; Distributed databases; Libraries; Parallel processing; XML; Data-driven architectures; Distributed systems; Dynamic application topology; High-performance computing; Stream processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Architecture and High Performance Computing (SBAC-PAD), 2011 23rd International Symposium on
Conference_Location
Vitoria, Espirito Santo
ISSN
1550-6533
Print_ISBN
978-1-4577-2050-5
Type
conf
DOI
10.1109/SBAC-PAD.2011.31
Filename
6106022
Link To Document