• DocumentCode
    1196899
  • Title

    Clustering data streams: Theory and practice

  • Author

    Guha, Sudipto ; Meyerson, Adam ; Mishra, Nina ; Motwani, Rajeev ; O´Callaghan, Liadan

  • Author_Institution
    Dept. of Comput. Sci., Pennsylvania Univ., Philadelphia, PA, USA
  • Volume
    15
  • Issue
    3
  • fYear
    2003
  • Firstpage
    515
  • Lastpage
    528
  • Abstract
    The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm´s performance on synthetic and real data streams.
  • Keywords
    data mining; facility location; learning (artificial intelligence); Web documents; approximation algorithms; clickstreams; data streams clustering; empirical evidence; real data streams; telephone records; Algorithm design and analysis; Approximation algorithms; Clustering algorithms; Data analysis; Meteorology; Partitioning algorithms; Statistics; Streaming media; Telephony; Web pages;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2003.1198387
  • Filename
    1198387