• DocumentCode
    144150
  • Title

    Deviation Estimation between Distributed Data Streams

  • Author

    Anceaume, Emmanuelle ; Busnel, Yann

  • Author_Institution
    IRISA, Rennes, France
  • fYear
    2014
  • fDate
    13-16 May 2014
  • Firstpage
    35
  • Lastpage
    45
  • Abstract
    The analysis of massive data streams is fundamental in many monitoring applications. In particular, for networks operators, it is a recurrent and crucial issue to determine whether huge data streams, received at their monitored devices, are correlated or not as it may reveal the presence of malicious activities in the network system. We propose a metric, called our metric, that allows to evaluate the correlation between distributed streams. This metric is inspired from classical metric in statistics and probability theory, and as such allows us to understand how observed quantities change together, and in which proportion. We then propose to estimate the our metric in the data stream model. In this model, functions are estimated on a huge sequence of data items, in an online fashion, and with a very small amount of memory with respect to both the size of the input stream and the values domain from which data items are drawn. We give upper and lower bounds on the quality of the our metric, and provide both local and distributed algorithms that additively approximates the our metric among n data streams by using math cal Oleft((1/varepsilon)log(1/delta)left(log N + log mright)right) bits of space for each of the n nodes, where N is the domain value from which data items are drawn, and m is the maximal stream´s length. To the best of our knowledge, such a metric has never been proposed so far.
  • Keywords
    computational complexity; data analysis; distributed algorithms; probability; software metrics; data item sequence; deviation estimation; distributed algorithms; distributed data stream model; lower bounds; massive data stream analysis; network system; our metric; probability theory; statistics; upper bounds; Computational modeling; Correlation; Data models; Distributed databases; Measurement; Monitoring; Vectors; DDoS attacks; data stream; deviation estimation; functional monitoring;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Dependable Computing Conference (EDCC), 2014 Tenth European
  • Conference_Location
    Newcastle
  • Type

    conf

  • DOI
    10.1109/EDCC.2014.27
  • Filename
    6821086