• DocumentCode
    2734894
  • Title

    Clustering data streams

  • Author

    Guha, Sudipto ; Mishra, Nina ; Motwani, Rajeev ; O´Callaghan, Liadan

  • Author_Institution
    Dept. of Comput. Sci., Stanford Univ., CA, USA
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    359
  • Lastpage
    366
  • Abstract
    We study clustering under the data stream model of computation where: given a sequence of points, the objective is to maintain a consistently good clustering of the sequence observed so far, using a small amount of memory and time. The data stream model is relevant to new classes of applications involving massive data sets, such as Web click stream analysis and multimedia data analysis. We give constant-factor approximation algorithms for the k-median problem in the data stream model of computation in a single pass. We also show negative results implying that our algorithms cannot be improved in a certain sense
  • Keywords
    computational complexity; data analysis; deterministic algorithms; pattern clustering; very large databases; Web click stream analysis; constant-factor approximation algorithms; data stream clustering; data stream model; deterministic algorithms; k-median problem; massive data sets; multimedia data analysis; point sequence; Application software; Approximation algorithms; Clustering algorithms; Computational modeling; Computer science; Data analysis; Laboratories; Streaming media; Telephony; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on
  • Conference_Location
    Redondo Beach, CA
  • ISSN
    0272-5428
  • Print_ISBN
    0-7695-0850-2
  • Type

    conf

  • DOI
    10.1109/SFCS.2000.892124
  • Filename
    892124