DocumentCode :
117300
Title :
Evaluating accumulo performance for a scalable cyber data processing pipeline
Author :
Sawyer, Scott M. ; O´Gwynn, B. David
Author_Institution :
MIT Lincoln Lab., Lincoln, MA, USA
fYear :
2014
fDate :
9-11 Sept. 2014
Firstpage :
1
Lastpage :
6
Abstract :
Streaming, big data applications face challenges in creating scalable data flow pipelines, in which multiple data streams must be collected, stored, queried, and analyzed. These data sources are characterized by their volume (in terms of dataset size), velocity (in terms of data rates), and variety (in terms of fields and types). For many applications, distributed NoSQL databases are effective alternatives to traditional relational database management systems. This paper considers a cyber situational awareness system that uses the Apache Accumulo database to provide scalable data warehousing, real-time data ingest, and responsive querying for human users and analytic algorithms. We evaluate Accumulo´s ingestion scalability as a function of number of client processes and servers. We also describe a flexible data model with effective techniques for query planning and query batching to deliver responsive results. Query performance is evaluated in terms of latency of the client receiving initial result sets. Accumulo performance is measured on a database of up to 8 nodes using real cyber data.
Keywords :
Big Data; SQL; data warehouses; distributed databases; query processing; relational databases; Accumulo ingestion scalability; Apache Accumulo database; big data application; cyber situational awareness system; distributed NoSQL database; query batching; query planning; real-time data ingest; relational database management system; responsive querying; scalable cyber data processing pipeline; scalable data flow pipeline; scalable data warehousing; Distributed databases; Indexes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Extreme Computing Conference (HPEC), 2014 IEEE
Conference_Location :
Waltham, MA
Print_ISBN :
978-1-4799-6232-7
Type :
conf
DOI :
10.1109/HPEC.2014.7040978
Filename :
7040978
Link To Document :
بازگشت