• DocumentCode
    2060911
  • Title

    Improved SAX time series data representation based on Relative Frequency and K-Nearest Neighbor Algorithm

  • Author

    Ahmed, Almahdi Mohammed ; Bakar, Azuraliza Abu ; Hamdan, Abdul Razak

  • Author_Institution
    Fac. of Technol. & Inf. Sci., Univ. Kebangsaan Malaysia, Bangi, Malaysia
  • fYear
    2010
  • fDate
    Nov. 29 2010-Dec. 1 2010
  • Firstpage
    1320
  • Lastpage
    1325
  • Abstract
    In this paper we propose a new approach based on Symbolic Aggregate approximation (SAX), called improved iSAX to recognize efficient and accurate discovery of the important patterns, essential for time series data. The original SAX approach allows a very high-quality dimensionality reduction and distance measures to be defined on the symbolic approach and it is based on PAA (Piecewise Aggregate Approximation) representation for dimensionality reduction that minimizes dimensionality. The proposed improved SAX, called iSAX includes the Relative Frequency and K-Nearest Neighbor (RFknn) Algorithm. The main task of the algorithm is to determine the sufficient number of intervals represented as symbolic (alphabet size) that can ensure efficient mining process and a good knowledge model is obtained without major loss of knowledge. We show that iSAX can improve representation preciseness without losing symbolic nature of the original SAX representation. The iSAX is compared with the original SAX and PAA representation, and demonstrate its quality improvement. Ten time series rainfall data sets were used. The experimental results showed that iSAX gives better term of representation and minimum Euclidean Distance.
  • Keywords
    approximation theory; data mining; learning (artificial intelligence); pattern clustering; time series; SAX time series data representation; high-quality dimensionality reduction; improved iSAX; k-nearest neighbor algorithm; knowledge model; minimum Euclidean distance; mining process; piecewise aggregate approximation representation; quality improvement; relative frequency; symbolic aggregate approximation; time series rainfall data sets; data mining; pre-processing and reduction; time series;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems Design and Applications (ISDA), 2010 10th International Conference on
  • Conference_Location
    Cairo
  • Print_ISBN
    978-1-4244-8134-7
  • Type

    conf

  • DOI
    10.1109/ISDA.2010.5687092
  • Filename
    5687092