• DocumentCode
    249381
  • Title

    Hydrological Time Series Anomaly Mining Based on Symbolization and Distance Measure

  • Author

    Dingsheng Wan ; Yan Xiao ; Pengcheng Zhang ; Jun Feng ; Yuelong Zhu ; Qian Liu

  • Author_Institution
    Coll. of Comput. & Inf., Hohai Univ., Nanjing, China
  • fYear
    2014
  • fDate
    June 27 2014-July 2 2014
  • Firstpage
    339
  • Lastpage
    346
  • Abstract
    Large amount of hydrological data set is a kind of big data, which has much hidden and potentially useful knowledge. It is necessary to extract these knowledge from hydrological data set, which can provide more valuable hydrological information and be useful for future hydrological forecasting. Data mining based on time series is widely used currently. There are some techniques based on time series to extract anomaly. However, most of these techniques cannot suit big unstable data such as hydrological big data set. Some important problems are high fitting error after dimension reduction and low accuracy of mining results. In this work we propose a new idea to solve the problem of hydrological anomaly mining based on time series. The idea combines time series symbolization with distance measure. It proposes Feature Points Symbolic Aggregate Approximation (FP SAX) to improve the selection of feature points, and then measures the distance of strings by Symbol Distance based Dynamic Time Warping (SD DTW). Finally, the distance which we have got are sorted. A set of dedicated experiments are performed to validate our approach. The experimental data set is based on the water level data set obtained from Xiaomeikou gauge station in the Taihu Lake from 1956 to 2005. The results of experiments show that our approach has lower fitting error and higher accuracy.
  • Keywords
    Big Data; data mining; geophysics computing; hydrology; time series; FP_SAX; SD_DTW; Taihu Lake; Xiaomeikou gauge station; big data; distance measure; feature point selection; feature points symbolic aggregate approximation; hydrological time series anomaly mining; symbol distance based dynamic time warping; time series symbolization; water level data set; Accuracy; Big data; Data compression; Data mining; Euclidean distance; Time measurement; Time series analysis; Data Mining; Distance Measure; Hydrological Time Series; Pattern Representation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2014 IEEE International Congress on
  • Conference_Location
    Anchorage, AK
  • Print_ISBN
    978-1-4799-5056-0
  • Type

    conf

  • DOI
    10.1109/BigData.Congress.2014.56
  • Filename
    6906799