• DocumentCode
    3144659
  • Title

    Analysis farm: A cloud-based scalable aggregation and query platform for network log analysis

  • Author

    Wei, Jianwen ; Zhao, Yusu ; Jiang, Kaida ; Xie, Rui ; Jin, Yaohui

  • fYear
    2011
  • fDate
    12-14 Dec. 2011
  • Firstpage
    354
  • Lastpage
    359
  • Abstract
    Network monitoring data provides insights into the network operation status. With increasingly sophisticated ways of probing, sampling and recording network activities, the huge amount of monitoring data brings both an opportunity and a challenge for network data analysis. We aim to build a scalable platform, named Analysis Farm, for analyzing network logs. Analysis Farm´s targets include fast log aggregation and agile log query. To achieve these goals, storage scalability, computation scalability and query agility should be addressed. The cloud computing and NoSQL technologies meet our needs by providing manageable on-demand hardware resources and novel data storage models. We choose OpenStack, an open-source cloud tool set, for resource provisioning, and MongoDB, a RDBMS-like document-oriented NoSQL system, for log storage and analysis. By combining scalability at both OpenStack and MongoDB, we build Analysis Farm capable of storage scale-out, computation scale-out and agile query. The Analysis Farm prototype in use, consisting of 10 MongoDB servers, aggregates about 3 million log records in a 10-minute interval and handle ad hoc query effectively in the log database accumulated with more than 400 million records per day. In this paper, we describe Analysis Farm´s background, targets, architecture and some experimental results. We believe Analysis Farm will benefit those who work on big-log-style data analysis.
  • Keywords
    SQL; cloud computing; data analysis; data loggers; query processing; storage management; Analysis Farm prototype; MongoDB servers; NoSQL technology; OpenStack; RDBMS-like document-oriented NoSQL system; ad hoc query; agile log query; agile query; analysis farm; big-log-style data analysis; cloud computing; cloud-based scalable aggregation; computation scalability; computation scale-out; data monitoring; data storage models; fast log aggregation; log database; log records; log storage; manageable ondemand hardware resources; network data analysis; network log analysis; network logs; network monitoring data; network operation status; open-source cloud tool set; query agility; query platform; resource provisioning; storage scalability; storage scale-out; Cloud computing; Engines; Hardware; IP networks; Indexes; Scalability; Servers; Big Data; Cloud computing; Log Analysis; NoSQL;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud and Service Computing (CSC), 2011 International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    978-1-4577-1635-5
  • Electronic_ISBN
    978-1-4577-1636-2
  • Type

    conf

  • DOI
    10.1109/CSC.2011.6138547
  • Filename
    6138547