• DocumentCode
    243692
  • Title

    A Platform for Analysing Stream and Historic Data with Efficient and Scalable Design Patterns

  • Author

    Simmonds, R.M. ; Watson, Paul ; Halliday, J. ; Missier, Paolo

  • Author_Institution
    Sch. of Comput. Sci., Newcastle Univ., Newcastle upon Tyne, UK
  • fYear
    2014
  • fDate
    June 27 2014-July 2 2014
  • Firstpage
    174
  • Lastpage
    181
  • Abstract
    Social media is an increasingly popular method for people to share information and interact with each other. Analysis of social media data has the potential to provide useful insights in a wide range of domains including social science, advertising and policing. Social media information is produced in real-time, and so analysis that can give insights into events as they occur can be particularly valuable. Similarly, analytics platforms providing low latency query responses can improve the user experience for ad-hoc data exploration on historic data sets. However, the rate at which new data is generated makes it a real challenge to design a system that can meet both of these challenges. This paper describes the design and evaluation of such a system. Firstly, it describes how a meta-analysis of the types of questions that were being asked of Twitter data led to the identification of a small set of queries that could be used to answer the majority of them. Secondly, it describes the design of a scalable platform for answering these and other queries. The architecture is described: it is cloud-based, and combines both continuous query, and noSQL database technology. Evaluation results are presented which show that the system can scale to process queries on streaming data arriving at the rate of the full Twitter firehose. Experiments show that queries on large repositories of stored historic data can also be answered with low latency. Finally, we present the results of queries that combine both streaming and historic data.
  • Keywords
    cloud computing; data analysis; query processing; social networking (online); Twitter data; ad-hoc data exploration; advertising; continuous query database; historic data sets; low latency query responses; meta-analysis; noSQL database technology; query answering; query processing; scalable design patterns; social media data analysis; social science; streaming data; user experience; Databases; Educational institutions; Market research; Media; Radiation detectors; Scalability; Twitter; Complex event processing; Distributed database; NoSQL; Scalability and Social Media;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Services (SERVICES), 2014 IEEE World Congress on
  • Conference_Location
    Anchorage, AK
  • Print_ISBN
    978-1-4799-5068-3
  • Type

    conf

  • DOI
    10.1109/SERVICES.2014.40
  • Filename
    6903262