Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA
Abstract :
With the emerging area of smart grids, one critical challenge faced by administrators of wide-area measurement systems is to analyze and model streaming data with limited resources on their embedded controllers. Usually, streaming data can be modeled as a multiset where each data item has its own frequency. In this paper, we study the problem on how to generate histograms of data items based on their frequency, so we can identify various issues such as power line tripping or line faults under constraints. The primary challenge for achieving this goal using conventional methods is that keeping an individual counter for each unique type of data is too memory-consuming, slow, and costly. In this paper, we describe a novel data structure and its associated algorithms, called the loglog bloom filter, for this purpose. This data structure extends the classical bloom filter with a recent technique called probabilistic counting, so it can effectively generate histograms for streaming data in one pass with sub-linear overhead. Therefore, this method is suitable for data processing in smart grids, where limited computational resources are available on the controllers. We analyze the performance, trade-offs, and capacity of this data structure, and evaluate it with real data traces collected through the frequency disturbance recorders deployed for the FNET/GridEye infrastructure. We demonstrate that this method can identify the frequencies of all unique items with high accuracy and low memory overhead, so that data outliers can be conveniently identified.
Keywords :
data structures; estimation theory; power engineering computing; power system control; power system faults; power system measurement; probability; smart power grids; FNET-GridEye infrastructure; computational resources; embedded controllers; frequency disturbance recorders; histogram estimation; line faults; loglog-bloom-filter; memory overhead; power line tripping; probabilistic counting; smart grid data processing; sublinear overhead; wide-area measurement systems; Data structures; Estimation; Frequency estimation; Histograms; Probabilistic logic; Radiation detectors; Smart grids; Data analysis; data structures; frequency estimation; smart grids;