• DocumentCode
    140972
  • Title

    Mars: Real-time spatio-temporal queries on microblogs

  • Author

    Magdy, Ahmed ; Aly, Ahmed M. ; Mokbel, Mohamed F. ; Elnikety, Sameh ; Yuxiong He ; Nath, Siddhartha

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Univ. of Minnesota, Minneapolis, MN, USA
  • fYear
    2014
  • fDate
    March 31 2014-April 4 2014
  • Firstpage
    1238
  • Lastpage
    1241
  • Abstract
    Mars demonstration exploits the microblogs location information to support a wide variety of important spatio-temporal queries on microblogs. Supported queries include range, nearest-neighbor, and aggregate queries. Mars works under a challenging environment where streams of microblogs are arriving with high arrival rates. Mars distinguishes itself with three novel contributions: (1) Efficient in-memory digestion/expiration techniques that can handle microblogs of high arrival rates up to 64,000 microblog/sec. This also includes highly accurate and efficient hopping-window based aggregation for incoming microblogs keywords. (2) Smart memory optimization and load shedding techniques that adjust in-memory contents based on the expected query load to trade off a significant storage savings with a slight and bounded accuracy loss. (3) Scalable real-time query processing, exploiting Zipf distributed microblogs data for efficient top-k aggregate query processing. In addition, Mars employs a scalable real-time nearest neighbor and range query processing module that employs various pruning techniques so that it serves heavy query workloads in real time. Mars is demonstrated using a stream of real tweets obtained from Twitter firehose with a production query workload obtained from Bing web search. We show that Mars serves incoming queries with an average latency of less than 4 msec and with 99% answer accuracy while saving up to 70% of storage overhead for different query loads.
  • Keywords
    Internet; query processing; social networking (online); Bing Web search; Mars; Twitter firehose; Zipf distributed microblogs data; aggregate queries; heavy query workloads; hopping-window based aggregation; in-memory digestion-expiration techniques; load shedding techniques; microblogs keywords; microblogs location information; nearest-neighbor queries; production query workload; pruning techniques; range queries; real-time query processing; real-time spatio-temporal queries; smart memory optimization; top-k aggregate query processing; Aggregates; Indexes; Mars; Memory management; Query processing; Real-time systems; Twitter;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2014 IEEE 30th International Conference on
  • Conference_Location
    Chicago, IL
  • Type

    conf

  • DOI
    10.1109/ICDE.2014.6816750
  • Filename
    6816750