Title :
Fault tolerant data flow using curator — Storm
Author :
Sainik, Lavanya ; Khajuria, Dheeraj
Author_Institution :
Centre of Excellence Mediation & Device, Ericsson India Global Services Pvt. Ltd., Gurgaon, India
Abstract :
Driven by the 3GPP (3rd Generation Partnership Project) evolving standards and advent of Big Data technology, to deal with huge volume, velocity and variety of data, various industries like telecommunication, warehousing and storage, financial and many more industries need to be compliant with this evolving technology. There is a huge demand to process both real time and stored data. In this paper we have analyzed an open source framework Storm, which is a real time distributed processing engine and suggesting an improvement on its fault tolerance mechanism so that it can be flawlessly used for any data processing use case. Vanilla storm provides guaranteed message processing however it promises “at least once” level of processing. For perfect fault tolerant system “exactly one” level of processing is required and to achieve this storm provides another framework, Trident which is built on top of it. Trident provides transactional spout where transactional metadata information <; transaction id, data > is stored in zookeeper which provides distributed coordination, thus across node / hardware data can be replayed in case of any failure, timeout, retry. Trident uses zookeeper for coordination of transactional information through apache curator framework. However with current trident framework per activity level (aggregator/reducer) commit can be easily obtained but no direct implementation for single chain level transaction commit. This paper describes an approach where by modifying existing transactional trident, chain level commit can be obtained using curator recipes.
Keywords :
Big Data; data flow computing; fault tolerant computing; meta data; public domain software; 3GPP; 3rd generation partnership project; Big Data technology; Vanilla storm; apache curator framework; data processing; distributed coordination; fault tolerance mechanism; fault tolerant data flow; guaranteed message processing; hardware data; node data; open source framework; real time distributed processing engine; transaction id; transactional information; transactional metadata information; transactional spout; trident framework; zookeeper; Distributed databases; Fasteners; Fault tolerance; Fault tolerant systems; Radiation detectors; Real-time systems; Storms; Big data; Fault tolerance; PathChildrenCache; Real time data; Storm; apache curator; batch input; transaction management; transactional spout; zookeeper;
Conference_Titel :
Software Engineering and Service Science (ICSESS), 2014 5th IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-3278-8
DOI :
10.1109/ICSESS.2014.6933608