• DocumentCode
    1556624
  • Title

    Efficient Evaluation of Continuous Text Search Queries

  • Author

    Mouratidis, Kyriakos ; Pang, HweeHwa

  • Author_Institution
    Sch. of Inf. Syst., Singapore Manage. Univ., Singapore, Singapore
  • Volume
    23
  • Issue
    10
  • fYear
    2011
  • Firstpage
    1469
  • Lastpage
    1482
  • Abstract
    Consider a text filtering server that monitors a stream of incoming documents for a set of users, who register their interests in the form of continuous text search queries. The task of the server is to constantly maintain for each query a ranked result list, comprising the recent documents (drawn from a sliding window) with the highest similarity to the query. Such a system underlies many text monitoring applications that need to cope with heavy document traffic, such as news and email monitoring. In this paper, we propose the first solution for processing continuous text queries efficiently. Our objective is to support a large number of user queries while sustaining high document arrival rates. Our solution indexes the streamed documents in main memory with a structure based on the principles of the inverted file, and processes document arrival and expiration events with an incremental threshold-based method. We distinguish between two versions of the monitoring algorithm, an eager and a lazy one, which differ in how aggressively they manage the thresholds on the inverted index. Using benchmark queries over a stream of real documents, we experimentally verify the efficiency of our methodology; both its versions are at least an order of magnitude faster than a competitor constructed from existing techniques, with lazy being the best approach overall.
  • Keywords
    query processing; text analysis; continuous text search queries; document arrival rates; document traffic; expiration events; incremental threshold-based method; inverted file principle; ranked result list; text filtering server; text monitoring applications; Dictionaries; Electronic mail; Indexes; Maintenance engineering; Monitoring; Query processing; Servers; Continuous queries; document streams; text filtering.;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2011.125
  • Filename
    5887333