• DocumentCode
    3434214
  • Title

    Detecting clustering in streams

  • Author

    Picollelli, Michael ; Boncelet, Charles ; Marvel, Lisa

  • Author_Institution
    Univ. of Delaware, Newark, DE, USA
  • fYear
    2012
  • fDate
    21-23 March 2012
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    We consider an anomaly detection problem. We are interested in whether or not a stream of data contains an unusual number or distribution of positives. Abstractly, the problem can be stated as follows: given a binary string, we wish to determine if the number or distribution of 1´s differs significantly from a known spontaneous rate. Furthermore, we consider the presence of an adversary who may try to distribute the 1´s into `clusters´ to fool our test. We compare tests to detect this type of clustering to a simple test on the number of 1´s, and show that clustered data is significantly easier to detect than i.i.d. data. We show that a test on the sum of the reciprocal run lengths in a binary sequence typically performs as well as the classical Wald-Wolfovitz test, and significantly better in some cases. We also show that if the length of the input stream is small, a simple additive correction term improves the detection rate of this test by a modest 1-2%.
  • Keywords
    binary sequences; pattern clustering; security of data; statistical analysis; Wald-Wolfovitz test; additive correction term; anomaly detection problem; binary string sequence; data clustering detection rate improvement; data streams; positive distribution; reciprocal run length sum; spontaneous rate; unusual number; Standards; Binary sequences; anomaly detection; clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Sciences and Systems (CISS), 2012 46th Annual Conference on
  • Conference_Location
    Princeton, NJ
  • Print_ISBN
    978-1-4673-3139-5
  • Electronic_ISBN
    978-1-4673-3138-8
  • Type

    conf

  • DOI
    10.1109/CISS.2012.6310747
  • Filename
    6310747