Title :
On mixing high-speed updates and in-memory queries: A big-data architecture for real-time analytics
Author :
Tao Zhong ; Doshi, Kshitij A. ; Xi Tang ; Ting Lou ; Zhongyan Lu ; Hong Li
Abstract :
Up-to-date business intelligence has become a critical differentiator for the modern data-driven highly engaged enterprise. It requires rapid integration of new information on a continuous basis for subsequent analyses. ETL-based and traditionally batch-processing oriented methods of absorbing changes into a relational database schema take time, and are therefore incompatible with very low-latency demands of realtime analytics. Instead, in-memory clustered stores that employ tunable consistency mechanisms are becoming attractive since they dispense with the need to transform and transit data between storage layouts and tiers. When data is updated infrequently, in-memory approaches such as RDD transformations in Spark can suffice, but as updates become frequent, such in-memory approaches need to be extended to support dynamic datasets. This paper describes a few key additional requirements that result from having to support in-memory processing of data while updates proceed concurrently. The paper describes Real-time Analytics Foundation (RAF), an architecture to meet the new requirements. Performance of an early implementation of RAF is also described: for an unaudited TPC-H derived workload, RAF shows a node-to-node scaling ratio of 88% at 8 nodes, and for a query equivalent to Q6 in the TPC-H set, RAF is able to show 9x improvement over that of Hive-Hadoop. The paper also describes two RAF based solutions that are being put together by two independent software vendors in China.
Keywords :
DP industry; competitive intelligence; query processing; real-time systems; relational databases; China; Hive-Hadoop; RAF; TPC-H; big-data architecture; business intelligence; data-driven highly engaged enterprise; high-speed updates; in-memory queries; real-time analytics foundation; relational database; software vendors; Data handling; Data storage systems; Distributed databases; Information management; Memory management; Real-time systems; Software; Analytics; Big Data; CRUD; Clustering; Low-latency; Real-time; Resilient Distributed Datasets;
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
DOI :
10.1109/BigData.2013.6691704