Title :
Scalable Infrastructures for Data in Motion
Author :
Ediger, David ; McColl, R. ; Poovey, Jason ; Campbell, Daniel
Author_Institution :
Georgia Tech Res. Inst., Atlanta, GA, USA
Abstract :
Analytics applications for reporting and human interaction with big data rely upon scalable frameworks for data ingest, storage, and computation. Batch processing of analytic workloads increases latency of results and can perform redundant computation. In real-world applications, new data points are continuously arriving and a suite of algorithms must be updated to reflect the changes. Reducing the latency of re-computation by keeping algorithms online and up-to-date enables fast query, experimentation, and drill-down. In this paper, we share our experiences designing and implementing scalable infrastructure around No SQL databases for social media analytics applications. We propose a new heterogeneous architecture and execution model for streaming data applications that focuses on throughput and modularity.
Keywords :
Big Data; SQL; data analysis; social networking (online); NoSQL databases; analytic workloads; batch processing; big data; data in motion; data ingest; data storage; execution model; heterogeneous architecture; recomputation latency reduction; redundant computation; scalable infrastructures; social media analytics applications; streaming data applications; Algorithm design and analysis; Clustering algorithms; Computational modeling; Data structures; Databases; Media; Servers;
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on
Conference_Location :
Chicago, IL
DOI :
10.1109/CCGrid.2014.91