Abstract :
A very important problem in data mining is finding patterns from sequential data. There is a vast number of sources for sequential data such as biological sequences, text documents, telecommunication alarm sequences, click streams, market basket data, Web logs, and other time series. One of the most popular patterns mined from sequential data are the episodes, i.e., directed acyclic graphs with labeled nodes (Mannila et al., 1997), An important subclass of episodes are the serial episodes, which are essentially sequences. Serial episodes are useful in many applications, including network monitoring and molecular biology. Currently, there are many situations were so much sequential data is produced that it cannot even be stored without great difficulties. That kind of sequential sources are called data streams. In this paper we focus on finding serial episodes from data streams. To the best of our knowledge the problem of mining serial episodes from data streams has been studied in depth only for length-1 episodes (Karp et al., 2003).
Keywords :
data analysis; data mining; directed graphs; pattern recognition; time series; Web logs; biological sequences; click streams; data mining; data streams; directed acyclic graphs; event streams; labeled nodes; market basket data; molecular biology; network monitoring; pattern mining; sequential data; sequential sources; serial episode discovery; serial episodes; telecommunication alarm sequences; text documents; time series; Computer science; Data mining; Frequency; Monitoring; Sequences;